Software has two ingredients: opinions and logic (=programming). The second ingredient is rare and is typically replaced by the first.
I blog about code correctness, maintainability, testability, and functional programming.
This blog does not represent views or opinions of my employer.

Sunday, August 3, 2014

I don't like Hibernate/Grails, part 2, repeatable finder problem: trust in nothing!

I was hoping for some 'inalienable truths' developer can rely on.... Like, that things that should obviously return true never return false.
(A more correct technical term for this is property but I find inalienable truth more fun).

The assert code from last post is one such example:
    assert BankAccount.findAllByBranch(myBranch).every{
    it.branch == myBranch
  }

Repeatable finders:
(I use this term as a reverse analogy to non repeatable reads.)  In Hibernate/GORM the above assertion does not need to be true.  For each finder/getter hibernate stores returned objects in its session and when next finder/getter is called hibernate will use the stored objects whenever it can.  It will not refresh them.   So you have to assume that any finder will return some (or many) of the domain objects found by previous finders.  What if something has happened that hibernate session does not know about between the time of your current finder and the time previous finders ran?

So here is one example showing how to break the above BankBranch assert:

(assume ac1.branch == branch2)

... HTTP request for User1:
... 'componentA' executes:
println 'Tracing something about '+BankAccount.findByName('ac1')

... some other expensive computation executes

... HTTP request for User2:
def acc = BankAccount.findByName('ac1')
acc.branch = branch1
acc.save(...)

... HTTP request for User1 continues:
... 'componentB' executes:
def myBranchAccounts = BankAccount.findAllByBranch(branch1) 

(myBranchAccounts includes ac1 but Hibernate returns old, not refreshed version of it so ac1.branch == branch2 is still true)

... myBrancheAccounts are rendered on a view page

(User1 is presented a list of all accounts from branch1 including ac1 which is jolly displayed showing branch2. User1 is surprised.)

This is not necessarily a strict concurrency problem. You may have code which bypasses Hibernate (maybe uses Groovy.Sql class directly) and get into very similar issues.

It is also interesting to think about compoentA and componentB code from the point of view of the unit test coverage leak problem I described in the 'part 1' post.

Here are 2 other inalienable truths (properties) that are no longer:
Uniquness constraints:
My BankAccount was declared with unique constraint on the name field (database enforced uniqueness on the name column). So if I do this:

  def accounts = BankAccount.findAll()

I will never see the same name repeated, right?  Wrong.  Here is a concurrent usage that shows how that breaks:

...HTTP request for User1:
... 'componentA' executes:
println 'Tracing something about ' +BankAccount.findByName('ac1')

... some other expensive computation executes

...HTTP request for User2:
def ac1 = BankAccount.findByName('ac1')
def ac2 = BankAccount.findByName('ac2')
ac1.name = 'ac1_b'
ac1.save(...)
ac2.name = 'ac1'
ac2.save(...)

... HTTP request for User1 continues:
... 'componentB' executes:
def allAccounts = BankAccount.findAll() 

(allAccounts contain old amount in ac1 with ac1.name == 'ac1'
and ac2 with ac2.name == 'ac1')

... allAccounts are are rendered on a view page 

(User1 is presented a list of all accounts and account 'ac1' shows up twice. User1 is upset)

You may find it unrealistic that User 2 can perform 2 renames concurrently to a short period between 2 finder calls in one HTTP request.  What if there have been 2 users renaming one object each?  In any rate there are other possible domain objects than Bank Account and other fields that may need to have uniqueness constraint.   I think the issue is demonstrated well enough.

Again, you can get into similar problems if you use some direct Groovy.Sql.

Results which look like uncommitted reads:
If I transfer money between accounts inside a transaction I should never ever see the transfer applied to one account and not the other.  Right?  Wrong again:

...HTTP request for User1:
... 'componentA' executes:
println 'Tracing something about ' +BankAccount.findByName('ac1')

... some other expensive computation executes

... HTTP request for User2:
def ac1 = BankAccount.findByName('ac1')
def ac2 = BankAccount.findByName('ac2')
transferMoney(ac1, ac2, 1000) //transfer 1000 dollars

...HTTP request for User1 continues:
def allAccounts = BankAccount.findAll() 

(allAccounts contain old amount in ac1 but new amount in ac2)

... allAccounts are are rendered on a view page 

(User1 is presented a list of all accounts and the list looks inconsistent.  Where did the 1000 go? User1 is software tester, he/she is now furious, spends hours figuring out what got wrong and the problem magically just disappears. )

And again just use direct Groovy.Sql to get into the same issue without concurrency.

Why Hibernate Works Like This?
I imagine that it must have been tempting to use single Java object to represent single record. This is also purist approach to ORM:  node.children.first().parent.is(node). But with hibernate this may have not been just a temptation.  Hibernate designers decided at some point that Hibernate will be saving objects attached to the session automatically. I imagine that it would be very hard to deal with auto saving if you had more than one domain object representing the same record (which one would you save?)
So why not refresh existing object each time it is retrieved?  Maybe because that would be a serious side-effect ;)   If some objects have been changed locally and also concurrently changed in the database:  have Hibernate designers been concerned about throwing ConcurrentModificationException from a finder?
Well I do not see why, because Hibernate finders already save objects and you are likely to get a collection of interesting save errors when calling a finder.  (Talk about aversion to side-effects!)

Can I be just careful?
Be careful not to pollute hibernate session - that may not be so easy.  For example, consider that the BankBranch class has something like featuredAccount association back to BankAccount.  If that gets eagerly loaded at the time branch1 is retrieved hibernate session is already polluted with one BankAccount at the onset of the first example (without any artificial println statements).

More complex applications may want to do some HTTP filter before and after logic which uses hibernate objects.  Applications may have layers of complexity which share the same hibernate session.  Controlling what is and what is not in that session is unrealistic.

Why is this bad?
Developers experienced with relational databases are likely to expect certain behaviors from the code that with GORM/Hibernate are just gone.  The unexpected behavior may be very intermittent and impossible to troubleshoot.  In my example I have 'broke' the code by inserting a println statement printing something about one account. What if finder 'polluting' hibernate session was executed only under some (rare) conditions?
I think developers tend to think of finders and getters as safe methods to call.   Almost always getters do not mutate anything.  With Hibernate getter/finder have side-effects, one of them is mutated Hibernate session and this is easy to forget when you design and code your application.

I believe that any nontrivial Grails app most likely has issues/bugs related to repeatable finder.
In addition, Hibernate offers NO public API to query what is stored on the session.  So if you think of some programmatic ways to solve this issue think some serious hacking.

How to test for this:
Repeated finder problems arising from the use of direct SQL or use of several hibernate sessions within single HTTP request can be discovered by both integration and functional tests.
The issue is not unit testable even if you think of Unit Tests as 'atomic tests' which are implemented as Grails integration tests.
Unfortunately, in many cases the issue will be triggered by concurrent access to your application. 
Testing concurrency issues is always not trivial.  So I do not really know how to answer this question. 
Ideally, the tools we use will be better designed to handle concurrency (have you read Simon Peyton Jones 'Beautiful Concurrency' http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/?). I am afraid Hibernate maybe never be one of these tools.  

New Session a Solution?
There is one very tempting partial solution to this.
If you really need consistent results from a finder keep it in isolated Hibernate session.(LazyInitalizationException alert flashing, OK I promise not to talk about LazyInitalization. :)
The idea is that if this line (from first example):

def myBrachAccounts = BankAccount.findAllByBranch(branch1) 

runs in a new (and therefore unpolluted) hibernate session the domain objects returned from the finder will all have the most recent values and none of the surprising things will ever happen.
Or ...
Enter next Hibernate gotchas:  DuplicateKeyException (and friends) the subject of my next post.

My current thinking is that using new session on certain 'crucial' selects seems to be a very good option to reduce the impact of repeatable finder problem. This technique could work well, especially if the finder we want to protect is the very last hibernate call in the HTTP request. I will not solve the problem but may reduce its impact. I may return to this thought later.

I believe there is currently no full solution to repeatable finder problem described in this post.

Added 2014/08/12:  Thinking more about impacts:
I find it helpful to think about application code from the point of view of properties (logical consistency rules that need to be true).  Application may rely on such properties explicitly (for example your logic will exception or return incorrect results if name property is not unique), implicitly (it will be embarrassing to show user list of items with seemingly broken uniqueness), and can actually be enforcing such properties (for example Grails code is used to enforce some more complex uniquness rules).

There are many, many possible properties related to domain objects (we have seen only 3 in this post).  Some are derived from software requirements but many have a much lower level nature. For example, each 'where' or 'join' criteria in underlying SQL statements is likely to imply a property (as shown by BankAccount.findByBranch() example).
Some properties are meant to be enforced by application code, some by DB, some by the framework.
Repeatable finder is likely to affect/invalidate large portion of such properties in your application!
And you cannot rely on the fact that even DB enforced property will hold.
The impact seems very big.

(EDITED 2014/09/13) References:  
I have posted a question about this issue here: Stack Overflow Question
I have now posted one (terrible) solution to how one can identify and fix such problems as answer to the Stack Overflow Question.
Here is Grails JIRA ticket for it: http://jira.grails.org/browse/GRAILS-11645
another relevant JIRA: http://jira.grails.org/browse/GRAILS-11644
Forum:  http://groups.google.com/forum/#!topic/grails-dev-discuss/wzekMGC0ibE
Hibernate JIRA:  https://hibernate.atlassian.net/browse/HHH-9367

6 comments:

  1. For some reason I've been under the impression that dynamic finders (findBy*, findAllBy*) do not make use of the instance cache, unless cached queries has been enabled (which is discouraged).

    (This is one reason given to make use of findById() vs get(), to avoid the cached instance.)

    I don't know about what findAll() does regarding the caches but in practical terms I've never made use of this method. (When would one get ALL bank accounts with no filter clauses.)

    Have I lost it?

    ReplyDelete
    Replies
    1. I have based this post on test cases I wrote using GRAILS 2.3.3. The configuration is:

      cache.use_second_level_cache = false
      cache.use_query_cache = false

      What you describe was my understanding too.

      I agree that findAll is not realistic, but you will get the same problem with any finder or criteria query. For example findAllByBranch().

      The test cases are easy enough to create.

      Delete
    2. I have also tested this behavior on 2.4.3 (which is the currently the latest).

      Delete
  2. second_level caching is a different issue... where data get cached across sessions. session level caching is just something that happens as part of hibernate, not much developer control over it.

    ReplyDelete
  3. Hi Robert, Will you please zip up your grails application and post it? Run: grails bug-report

    ReplyDelete
  4. Welcome Post very usefull informatation.and iam expecting more posts like this please keep updating us........

    ReplyDelete