Stuff about programming, programming style, maintainability, testability. Dedicated to my coworkers and friends. Everyone is welcome to leave comments or disagree with me. This blog does not represent views or opinions of my employer.

Saturday, October 18, 2014

I don't like Hibernate/Grails part 10: Repeatable finder, lessons learned

Repeatable finder (concurrency issue described in part 2) is what started/motivated this series. I had hoped that that this issue will draw some reaction from the community. It did not. Why? Tallying up all the answers/responses from the last 2 moths amounts to: 6, none of them useful or even correct. What does that mean?

In this series I tried to sneak in some things that interest me like FP and logical reasoning of code correctness.  I will use repeatable finder problem as a way to sneak in a bit more of this stuff later in this post.

Denial isn't just a river in Egypt?
This has been a twilight-zone.
To refresh you memory:  if more than one query is executed in a single Hibernate session and the result sets intersect then the second query returns a weird combination of old and new data.  That can break the logic in your code,  for example:
       Users.findAllByNickName('bob')

can return records with nickName != 'bob'. Other things can go wrong too: Maybe you have used a DB unique key to define equals()?  Or maybe you have used a DB unique key as a key in a Map? Any of this could go very wrong.

At first, I thought that the issue must be well know and I am missing some way of handling it. This, unfortunately, is not the case. Very recently, I came across this blog from 2009: orm-sucks-hibernate-sucks-even-more
"... take a look how even a silly CRUD application would suffer, once you've got "not-very-recent" object from the session"
that quote points to a (now non-existing) page on the hibernate website. Did we know more in 2009 that we know now?  If we did know, why have we allowed for this issue to stay unresolved? Well, this is all speculation.

I tried my best to do 2 things:  make the community aware and persuade Hibernate to fix it. I have failed miserably on both accounts. Here are the results of my efforts (as of Oct 17, 2014, tallied after a bit over 2 months since I started my crusade):
  • post part 2:  effectively no replies, but over 1100 reads.
  • Grails JIRA: incorrect comments and then ignored
  • Hibernate JIRA:  rejected (works as intended) with suggested work-around which is incorrect
  • Stack Overflow question:  a whooping +4 score (started at -1) and bunch of incorrect or meaningless answers
  • Grails forum: 0 replies
Hibernate ticket was the weirdest experience.  It got rejected very fast (not a bug) with a comment to just use refresh().  After pointing out that this workaround is a total nonsense, I was sent to read some completely not relevant documentation about concurrency.  After that, my (and Tim's) comments have been ignored.

What can I conclude from these 3 facts?:
  • nobody seems to know how to resolve or even work-around this issue
  • experts provide advice that is incorrect
  • there is no interest in solving, discussing or even acknowledging it as a problem
I do not know, but probably nothing good. I think it is interesting to try to puzzle out the few responses that the problem did generate. I will try to do that here.

The replies I got from the expects fall into 2 categories. The first category are answers like this:
  • It is any ORM issue
  • Any database application will have an issue like this 
It is true that the issue can be resolved with DB locking.  In particular, I could prevent repeatable finder by having all HTTP requests wrapped in long transactions and configuring higher (repeatable read) isolation level. Indeed, it is a big framework design failure, if we need to resort to things like this.
It is NOT true that any ORM and any database application will have this problem.  The most likely explanation for this type of response is that developers do not think about side-effects. They see Hibernate query and think of a SELECT statement only.  If I see a problem, it must be from the SELECT, where else would it come from?  This is consistent with the point I tried to make in my previous posts.

The second category are answers that suggest using refresh() or discard() to fix the problem:
'To fix your problem'
  • add refresh() to your code
  • or:  add discard()/evict() to your code
My first reaction was: Grrr, my second:  Hmm.  If I could only continue this conversation I am sure it would go like this:
Me: Where do I add these?  Expert's Reply: Add them where you have that problem. 

If you have been following various Hibernate discussion forums, you must have noticed that the same type of advice (either to add refresh() or to add evict()) shows up very frequently. This advice is never right.

Grails and Hibernate experts:  I am very disappointed in you.

Add refresh()...  add evict()... Thinking in Hibernate.
(Here is where I sneak-in some interesting stuff.)
This is how we typically reason about our code: the problem is on line 57 because variable xyz is ... and then on line 89 we do that..., and then on line 127 we have an if statement that goes like that...
We reason about our code by examining chains of programming instructions.

This is called imperative thinking and imperative programming.  If you read my previous posts you may assume that I consider such programs not logical,  they are logical, only the logic is very cumbersome and complex.
A well designed OO program is where lines 57, 89 and 127 are all in the same class and the chains of instructions we need to examine are relatively short.  In a procedural program lines 57, 89, and 127 can be anywhere and chains are long.  Badly designed OO programs behave like procedural programs.

Repeatable finder is a great example where imperative reasoning fails.  The problem is not something between lines 57 and 89 or something on line 115.  The problem is (or can be) anywhere.
Answer 'add refresh()' or 'add discard()' is a very imperative thinking: it assumes I can add it on line 89.  (I can only conclude that this expert advice is not to sprinkle refresh() all over my code just for fun ... and because my code will run too fast without it.)

So what is the alternative?  The idea is to think about a block of code in a way that can prove certain behavior of that block. If we know that a code block 'A' exhibits the same behavior no matter where it is placed or how is used, then we no longer need to think about lines 57, 89, and 127 or about chains of computing instructions.

This is called declarative thinking and programming.  Logical reasoning is now simplified, I no longer need to follow chains of programming steps to reason about the correctness.

Declarative thinking works great, except, if I cannot trust any property, even that:
    Users.findAllByNickName('bob').every{it.nickName == 'bob'}
then I am stuck.
That may sound like a limitation of declarative programming:  I can still keep going using imperative approach. That is true, and we all 'keep going'.  I did not stop programming my project and no, I did not add refresh() all over my code. That is why our applications are so buggy: we ignore logical problems unless we can pin them to line 127.

Side Note: The fun starts when I start combining my declarative code blocks into bigger blocks. Code needs to be logically composable.  I want a bigger block (composed of smaller blocks) to have properties too. Some like to call it programming with combinators.

Conclusions:
I would like to suggest this as a new rule of thumb: 
  the answer to use Hibernate/GORM refresh() or evict()/discard() is wrong regardless of the question.
(with exception of Functional Tests - which may need to refresh some records used in asserts).  Please comment below if you find a counter example to this rule.

I am not claiming that I know the solution to Repeatable Finder.  Maintaining Hibernate cache synchronized with the DB is hard, maybe impossible.  One way of dealing with hard problems is: make them somebody else's.  If GORM/Hibernate just told me when the query is lying (returns stale data even if it has new data) or allowed me to request/configure the query to refresh all records... That would go a long way.

It looks like the community has decided to not acknowledge Repeatable Finder as a problem. There is really no good solution for it and acknowledging it would be admitting to that fact. This issue is likely to remain unsolved and ignored. More complex Grails apps are doomed to work incorrectly under heavier concurrent use.

I have added a label to my posts (which does not work so do not click on it - and that convinces me that blogger must be using Hibernate):  'Stop thinking in C++ and Java'.  I think we need to stop thinking imperative or at least stop thinking only in imperative terms.

Next post:  I need to do one more to wrap-up. I will be finishing this series next week.

I have a busy period ahead of me.  I started going over a set of courses published online by University of Oregon (Oregon Programming Languages Summer School) and that will be many hours of not very easy listening and learning.  I also have to start preparing/training for a ski camp in early November (I live in CO and skiing has started here already).  No, this will not be a SKI calculus camp;) - but then, believe it or not, technical skiing is (or should be) a fun intellectual activity too.

Friday, October 10, 2014

I don't like Hibernate/Grails part 9: Testable code

"I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence". Kent Beck.
Finding more problems with less effort is something I totally agree with.  Which approach to testing will give me the most bang for the buck?  I like to think about tests in a pragmatic way: I write tests to find bugs and guard against bugs. I consider manual testing rather ineffective and, in most cases, inferior when compared to automated test.

Testing seems to be deeply related to my last two posts.  It is obviously related to software correctness. Less obviously: out approach to testing impacts how we perceive the framework, how we test can explain why we like or don't like Hibernate and Grails.

I have moved away from unit test in Grails.  My approach to testing is an 'inverted pyramid'. I test internal implementation details using Grails integration tests and I write a lot of functional tests.  My test are less 'unit' than you may like them to be (they interact with actual database, mocking is replaced with data setups) but are closer to the reality. This post explains my reasoning behind this decision.

I am considering the unit/integration/functional division from the 'testing philosophy' point of view.  I care less if tests are placed in test/unit, test/integration or test/functional folder.  I will stay away from the hate-love TDD debate.

Testing choices: (Just to make sure we are on the same page.)
Testing spectrum has these 2 extremes with very different characteristics:
(1) Unit Tests:  testing expected behavior; testing in a fake environment; testing internals not visible to the end users 
Also: white box testing; bottom-up testing
I will test parts (units) of my software in isolation from anything else.  I will 'mock' interaction with the rest of the system.  Since I know exactly what can go wrong, I can write mocks that exercise the tested part under a specific situation.
(The mocking aspect is specific to OOP, nobody writes mocks in FP... with that said, there is this great book:  http://en.wikipedia.org/wiki/To_Mock_a_Mockingbird ;))

(2) Functional Tests: testing for unexpected problems; testing in a real environment; testing functionality visible to the end users 
Also: black (more gray than black - I need testing 'hooks') box testing; top-down testing
I will test my application as a whole.  I identify a list of specifications and I write tests to verify that my application works correctly with respect to these specifications. I do not need to know everything that can go wrong, I assume that since I covered a comprehensive range of data and scenarios representing actual software usage, most of what can go wrong will be uncovered.

Integration tests fall between 1 and 2.

In this post I argue that testing in Grails needs to focus on unexpected problems, and should be done in a real (or close to real) environment. That moves the testing away from unit and towards integration and functional.

Why Unit Tests are Great:
  • execution speed: no need to startup the whole infrastructure, etc
  • good coupling: the tested unit becomes married to the unit test, this can be a happy, strong marriage that is going to survive ups and downs of refactoring.
  • atomic/unit nature: the idea is to make a perfect whole from perfect units. (Except, that principle is logically flawed in OOP.) This is sometimes called bottom-up testing. 
  • unit tests can aid software design and coding.
  • aggressive conditions: once you know what kind of problems to expect you can stress the code 'more aggressively' introducing scenarios which are rare or complex to setup using integration of functional tests.  This is rarely done.
This post is not a criticism of Unit Tests.  It is a criticism of how they are used.
Where is the most bang?
Some people disagree with me when I say:  What really needs testing is side-effects. Errors caused by side-effects are the hardest to troubleshoot and fix and are often very intermittent.  Side-effects limit developer ability to do logical reasoning on the code.  Side-effects can be very confusing.
There are many books and articles about testing and side-effects have not been discussed much.  Why is that?

Case study:  This example is repeated from my last post: 
    def users = User.findAllByOffice(office1) //code (A)

Assume user has userName (with unique constraint),  office (of type Office) and userPreferences (of type UserPreferences). This code:
  1. will issue a SELECT statement (with whatever locks) 
  2. can save some changed objects to the database (part 5)
  3. can save some unchanged objects too  (part 6)
  4. will impact some record types returned by queries that follow it (from proxies to actual objects).  Some records returned from this query will use Hibernate Proxy to implement preferences and some will use the actual UserPreferences class. (Some developers may be surprised that the same goes for the office association). (part 4)
  5. will impact the data content of some records returned by queries that follow it.  Similarly, the content of some records returned from this query may be different from the data returned by the underlying SELECT statement. (part 2)
Can simply adding line (A) break my code?  Clearly it can!

Side Note 1: Why are side-effects confusing:  There is this theory that human memory and reasoning work by 'chunking'.   The idea is that the human brain stores knowledge in chunks.  Each chunk gets a label which works something like a DB index. Human brain can recall the whole chunk using that label. There is this very prominent chunk we have all formed:  SQL SELECT query. When you look at code (A) you inadvertently call for that chunk.  Yet 4 out of 5 side-effects associated with that code have nothing to do with 'SELECT' chunk your brain has just found.

Side Note 2: Why are side-effects not logical:  Maybe a better term would be: not logic-friendly.  I have dedicated large parts of this post to explaining why, but I will add a quick high-level explanation from a slightly different angle.  You can safely skip this side-note if you are allergic to academic CS.

Logic needs connectives, to start with, it needs conjunction ('and', '^'). Logical conjunction follows this reduction rule:
  if A is true 'and' B is true then B is true.
You can think of this as one of the axioms. Logic cannot start without it. The programming equivalent of this is the following 'beta' computation rule:
  second (A, B) '=='  B
Which means: if I do computations A and B (compute a pair) and then ignore A, then this should be equivalent to just computing B.  This is obviously not true if A has side-effects impacting computation B! Side-effects inflict a mortal wound to the (straightforward) correspondence between logic and programming.

The above paragraph may as well have been copied from first pages of an introductory Type Theory book. Keeping side-effects on a tight leash allows to recover correspondence to logic, but, this is no longer first pages (google: 'Hoare Type Theory'). A comment by Mark, on my software correctness post, used the term 'effect' (contrast it with the more unruly 'side-effect', google: 'lax logic', 'monads in lax logic' or 'Effect System').
With unruly 'anything goes' side-effects logical reasoning on a program becomes very, very complex and much less potent, Type Theory-Logic correspondence is lost. I am trying to learn this stuff but even at my current newbe level: it is eye opening.

If you decided to skip the last side-note this is a good point to return:
Welcome back!  You can ignore all of this academic mumbo-jumbo. Just be aware that whatever voodoo your brain does to form judgments about your GORM code (or any other code with unruly side-effects) should be a suspect. This voodoo is far from any straightforward logic and it is easy to get things wrong. Hopefully, I managed to scientifically convince you that: 
  • Side-effects need extra attention when testing
  • This will not be: 'testing expected behavior'
Here is a more pragmatic argument:  As I explained in my last post: side-effects 2-5 are a no-show for very simple CRUD applications and are not that frequent for simple CRUD apps.  The question is: are you satisfied with your app if it works 95% of the time?
The point is: complex Grails apps or apps that strive for more than 95% correctness need to make serious attempt to test side-effects 2-5.

Unit Test Ostrichism
Keep your nose out of trouble, and no trouble'll come to you. - The Lord of the Rings movie
I have been scratching my head asking: why my current project is finding so many issues that nobody else has reported.  One possible answer is:  everyone else has been unit testing and 'assuming away' the reality.  If that is true, then maybe everyone else has also decided that Hibernate is OK.

Here is a question for you:  When writing unit tests for code that includes GORM queries (assume something similar to code (A)), do you use mocks to verify that:
  • the query does not save any objects? 
  • your code works correctly even if the query returns User object with office2 != office1?
  • your code works correctly even if the query returns objects that violate DB enforced constraints (such as more than one user with the same userName)?
  • your code works correctly not only on actual domain objects but also on hibernate proxies? 
I do not. I decided that the amount of work needed to write tests like this would be prohibitive.  But if you disagree please drop me a comment!

Here is another question:  The above bullet list spells out some impacts of side-effects 2-5. This list is not complete.  Can you think of other examples?
I know that I do not know all impacts.  I am not even sure if I understand all GORM side-effects. Testing for expected problems almost guarantees that I will not find what I do not know.

Code with non-localized impacts is not testable.
OO code needs decoupling.  In a well written OO code, state mutation in object A does not impact other objects. OO software designed without decoupling is not testable.  There is a technical term for it: spaghetti.

Almost every post in my 'I don't like' series has shown an example of GORM code where the behavior changes (even breaks) when an isolated GORM query is added to or removed from the code. Hibernate queries create impacts that cannot be easily localized to one or few classes.

How can GORM and Hibernate defend their design as testable?

Fail Fast and Test Easy:
This is something very much missing in Grails.  Very often an incorrect code is likely to work just fine 50% maybe even 80% of the time. Data changes or a query is added or removed somewhere and things break.
Most of my posts in this series pointed out examples like this (see part 3 and part 6).  Ideally, incorrect code should either fail to compile or should fail during my first attempt to execute it.  This is often not that easy to accomplish with a languages like Java or Groovy, but there is just no excuse for, for example, allowing me to use GORM object obtained using hibernate session S1 within session S2.  If such code fails intermittently it should ALWAYS fail.

Without fail-fast philosophy unit tests are not worth much.

Unit Testing and FP:
"Writing unit tests is reinventing functional programming in non-functional languages" (Christian Sunesson - on github).  (I found the linked post a very interesting reading).
Here is a mental exercise:  when reading the following blog about 'testable code': http://googletesting.blogspot.in/2008/08/by-miko-hevery-so-you-decided-to.html , think how each of the guidelines relates to FP. Notice they are all N/A!
'Testable code' term is OOP self admitting to its limitations (my OO code is not testable unless I follow these list of rules...). FP code does not need any tweaking or special guidelines to be testable.

In many respects Hibernate design is the opposite of FP. If one is very unit testable the other one probably is the opposite of testable.

Not just Hibernate:
Grails/GORM/Hibernate stack is complex and has some complex bugs. So does Groovy. I know that lots of my app functionality will break next time I upgrade Grails. Is testing for 'expected' problems a the best investment in this environment?

Conclusions:
Grails contradicts itself on unit testing:  Grails framework provides rich tools to unit test domain objects, only these tests will be rather useless. I think, this confusion is not something that Grails has introduced. Java community in general does not perceive side-effects as something to worry much about. Unfortunately, perception != reality.

More on the confusion:  it should be clear by know that the blame for a lot of this resides with how Hibernate session works. Non-localized side-effects are really hard to test.  With that stated, I would expect each JUnit integration tests in Grails to run in its own session.  The very idea of several tests sharing the same Hibernate session is repulsive.  Take a look that these JIRA tickets: GRAILS-11644GRAILS-11706.

I find Grails attitude towards testing insanely confusing.

Testable Code
Testable code should be defined as: a code that makes unit tests effective in bug prevention.
This statement is not something everyone agrees with. Unit tests are often viewed as a way to aid the design and coding process (TDD) and this becomes their main purpose in life.  
"Unit testing is not about finding bugs", see Writing Great Unit Tests. Don't you agree that something is very wrong with this sentence?  Instead of saying "unit testing is not about finding bugs", maybe we should rethink how we write the tested code so it is?  How about: start paying attention to the side-effects?

If you are doing something over and over and it does not work you have 3 choices:
  • keep on doing it and expect it will work (called insanity, also know as Grails approach to unit testing)
  • keep doing it and null out your expectations ("unit testing is not about finding bugs")
  • change the way you do it (redesign the tested code or/and the way you test).
I am suggesting the 3rd bullet is the way to go.  I cannot redesign Grails but I can rethink how I test.

Functional Test Testable
I like Geb. Writing good functional tests is not trivial and, like with unit tests, it impacts the design of the tested code. The term 'testable code' needs to be extended to functional tests.  ... it maybe an idea for a long post somewhere in the future.

Next Post:
I want to start wrapping up my 'I don't like' series.  In my next post, I plan to give an update about the repeatable finder issue (post 2) and include a few final thoughts.



Friday, October 3, 2014

I don't like Hibernate/Grails, part 8, but some like Hibernate and Grails. Why?

Small change in plans. I wanted to write about testing, but this topic logically precedes testing. Writing last 2 posts made me realize something: each application is different and Grails/Hibernate problems I am likely to notice are very, very much dependent on the app I am working on.

Simple App:  Think of something you can generate by asking Grails to do the coding for you. Simple app is a CRUD application with these characteristics:
  • Simple domain Objects without relationships
  • No transactions/services
  • Not more than one hibernate query per request
  • Simple validation logic (contained in domain objects)
This example repeats part of my last post.  Assume that I have a domain object User which looks similar to this:
    class User {
        ...
        Office office
        UserPreferences preferences
    }

In a simple app this code:
    User.findAllByOffice(office1) //Code Example (A)

will exhibit only one type of side-effects: it issues SELECT statement + some DB locks.

Complex App:  In my last post I have listed several side effects that can be associated with (A). Clearly, complex apps are a different ball game!  Here are the side effects listed again.  Code (A)
  1. will issue a SELECT statement (with whatever locks) 
  2. can save some changed objects to the database (part 5)
  3. can save some unchanged objects too  (part 6)
  4. will impact some record types returned by queries that follow it (from proxies to actual objects).  Some records returned from this query will use Hibernate Proxy to implement preferences and some will use the actual UserPreferences class. (Some developers may be surprised that the same goes for the office association). (part 4)
  5. will impact the data content of some records returned by queries that follow it.  Similarly, the content of some records returned from this query may be different from the data returned by the underlying SELECT statement. (part 2)
Side-effect 2-5 have very non-local impacts (can affect unrelated parts of code sharing the same hibernate session) and can be extremely hard to avoid.  This leads to code that is unpredictable.  So, how is it possible for Hibernate to be so popular?

Counting the number of side-effects in (A) that impact my code is just one possible measure of my app complexity.  As my application becomes more complex, these side-effects become 'louder' (impacts become more frequent and more noticeable).  In this post, I am less interested in how many side-effects impact my code, more in how 'loud' they are. 

Following table shows a fictional CRUD application that started very simple and became more complex over time.  
H5 - total number of side-effects in queries similar to (A)
H6 - frequency of problems related to queries similar to (A)



Complexity
Exposure to Problems
Explanation of impacts
H5
H6
1
Simple App
Happy Days!
Happy Days!
1

2
Add Domain Object Relationships
Hibernate Proxy Objects
2
low
3
Add more complex validation - more than one query in the same session
Repeatable Finder
3
low
4
Move validation logic outside of domain objects
Auto-flushing
4
low -mid
5
Add Services and Transactions
LazyInitializationException

Auto-flashing becomes less loud (unwanted saves are rolled back)

Transactional integration tests diverge from reality
Auto-flushing can still be a problem: saving can fail.
mid
6
More complex data passed from client
Unmarshalling of client data keeps changing between Grails versions
Small intro to version upgrade problems

mid
7
Added logic in Filters
Higher probability of repeatable finder issue.

Need for more complex test infrastructure

Side Note: What is the before-after-afterView ordering if Controller does a forward?

Potentially interesting filter cleanup or setup ordering issues

mid
8
Application managed hibernate sessions
DuplicateKeyException and similar hard to troubleshoot errors.

Queries can save unmodified objects.
5
high

8 needs more explanation.  Here is one example why I need to create small hibernate sessions:
Integration tests may need to include scenarios mimicking activity performed over several HTTP requests.  Typically, I see such tests mashing all logic into one test method executing everything in the scope of the same hibernate session.  In real life, each of the requests will be performed in a separate hibernate session. Tests have diverged from reality.  Side-effects 2-5 listed above will have dramatically different impact on the code when ran in the test environment.

Audacious App:
Consider Grails/Hibernate implementing Type 2 Slowly Changing Dimension (the one using time slices).  To do that, I may want to use something like a session variable (available in most databases, including Oracle or Postgresql) to define a time point and use read-only views to get a snapshot of all my data at the specified time point.  I will configure my GORM domain objects against these views and see what happens.

To use session variables, I will need to wrestle with Spring framework DB connection management to make sure that the connection is not swapped under me, because that would change the definition of session variable.  That is not trivial but doable.

How does that change my exposure to Hibernate side-effects?  Consider this: moving the time-point one day forward applies a day worth of user activity in just a few milliseconds!  Many concurrency problems in the application just became super loud. That includes repeatable finder (side-effect  5, my blog part 2).  Because of that I need to wrap each use of a time-point session variable with a withNewSession() block (or something equivalent).  That exposes my code to issues documented in part 3 and 6 (side-effect number 3!).  All of these are now super loud.

Conclusions:
This is my best attempt to explain the discrepancy in perception of Hibernate and Grails. I think there is more going on that I do not understand, but this is my best answer to-date.

The 5 side-effects listed in this post are worth investigating more.  I will refer to them again when talking about testing (next post).