Saturday, September 20, 2014

My dream: software without any bugs ... and is Groovy functional? How about Grails?

This post is about a topic that absolutely fascinates me: completely bug free software (is that even possible?).  This may be not a very easy reading but I do hope you will stay with me to the end of this post.  I divided it into smaller sections, so you can do a quick skim, come back to drill into details.  The intended audience is Groovy developers. (But if you do not program in Groovy you may find a lot of this relevant too.)

This post was motivated by several things, some Groovy, some Grails, some Java, but one of the biggest motivators was Hibernate rejection of this JIRA ticket: HHH-9367.  

Death, Taxes and Software Errors:

Software tests provide empirical evidence of software correctness.  With all the software projects humanity has completed, we also have (a much stronger) evidence that no matter how well tested, the software will have bugs.  Tests (assume 100% code coverage) stress the software under a fraction of possible combinations of things that impact it.  We write tests because that is the best thing we know how to do.

Is there any different way to achieve software correctness than testing? 

On one of my job interviews I have been asked about my attitude towards TDD and testing in general, and I answered:
  Ideally, I would love to write code that does not need to be tested. 
As you can guess, I did not get that job.

Functional Programming:

"The Groovy programming language, since its inception, has always been pretty functional"
(http://glaforge.appspot.com/article/functional-groovy-presentation).  This is a very good and informative presentation. Growing level of interest in Groovy/Grails community in functional programming is a great thing.  I like the gradient, Groovy language is moving in the FP direction.

But ... there seems to be quite a bit of confusion about what FP is. This state of confusion is very normal for our industry.  Here is an example:  What is OO? Tim Rentsch (1982): "Every manufacturer will promote his products as supporting it. Every manager will pay lip service to it. Every programmer will practice it (differently). And no one will know just what it is".  This proverbial words can equally well be applied to Functional Programming,  RESTful design and many other concepts.

So what is FP?  Some will say that FP is about using the new stream API in Java (are we confusing Functional P with Fluent P?), but open an introductory paper about FP and you are likely to be greeted with terms like Kleisli category (where is my category theory book?).  It is hard to find any middle-ground here.  I will focus on a better question: what is FP forThe why? question is often less confusing and easier to understand than the what?  


Back to software without errors:

Fluff ends here.  'Correct software' means nothing until I spell out how exactly I expect the software to work.  I need to formalize this somehow.  I will use the concept of property to do that.
Property is simply a logical condition about the code.  Verifying a property is verifying that the code works as expected.
So how can I verify with 100% certainty that software satisfies a property?   For this demonstration, I need something that is easy to reason about:
Recursion:  I chose recursion as my tool of choice for this post because: it is a well understood concept and because it is very well suited to proving correctness (there is this Math 101 thing called mathematical induction).

Examples of Properties:  Here are just some examples to think about.
Example 1:  Groovy allows to overload operators, in particular '+'.  What properties does the '+' have?  Can I assume that
   (a + b) + c  == a + (b + c)
(which you would expect for '+' in algebra)?   The answer is NO.   It is OK for developer to code whatever he/she pleases when overloading '+'.

Example 2:  If I only had a penny for each time when equals() and hashCode() have not been implemented consistently in a Java app:
      if a == b then a.hashCode() == b.hashCode()
Java documentation tells developers to do it, well they often don't.  This is part of imperative language culture:  polymorphism does not mean much.

Example 2b: Another property related to Java equals() is this:
    if a == b and b == c then a == c
It is supposed to be always true. Is it?

Example 3:   Groovy has a method defined on collections called collect. It applies a closure to each element of a list returning a new list. This is often called map in other places.  Recall that Closures overload << to mean function composition.  Is there a property associated with these? Here is one:
  list.collect(c1).collect(c2) == list.collect(c2 << c1)
If I have used fpiglet, I could write this in a much cleaner form:
  (map(c1) << map(c2)) (list) == map(c1 << c2)(list)
and, if it was possible to implement a meaningful equals for closures, we could simply say:
  map(c1) << map(c2) == map(c1 << c2)
another words: map 'preserves' function composition.   Incidentally, there is a name for this property, it is called second Functor law.

Properties are software requirements that are 'logic friendly'.  They typically have the form:
     For all [list of symbols]  [logical expression]
because 'For some' just doesn't cut it in logical reasoning.  Property testing is one of the things missing in OO programming. But more on this later.

Example 4 (Advanced):  Try formulating some of your application business requirements as a property. (Example: username has to be unique across non-retired users. Where would I formulate it: DB records, domain objects, JSON results, ...?)

What is wrong with Java equals():

I will not be able to go far with logical reasoning about source code if I cannot say that code A is equivalent to code B.  Consider this Groovy code:
 Closure c1 = {int x -> x+1}
 Closure c2 = {int x -> x+1}
 assert c1 == c2 //FAILS

c1 and c2 are logically the same function but == comparison between them fails.  Is this a bug in Groovy?  No, this is a general (any computer language) problem.  It is impossible to computationally verify that 2 functions are the same.  Programmatic == is not something that always makes sense.  (Language can be more logical than Java about this and if you attempt to use == where it does not make computational sense, the compiler could reject your code - Haskell does that.)
Also, see Example 2 and 2b,  to do logical reasoning we need something stronger, something that does not depend on a developer's whim.

I will use '==' to indicate logically equivalent code.  So 1 '==' 1 and c1 '==' c2.  Basically, A '==' B if I can guarantee that replacing code A with code B will not change the behavior of the program.


What is wrong with side-effects:

I want to be able to logically reason correctness of parts of my code.  This is close to impossible if my code has side-effects.  To be able to do formal reasoning I need my code to be predictable.  If c is a closure and x is its parameter, I would like to have this 'predictability' property:
    c(x) '==' c(x)
... basically if I call a function twice with the same arguments I expect the same result.

Example:  Consider this Groovy code:
    def i = 0
    Closure f = { x -> i++}
    assert f(1) == f(1) //FAILS

Things take a dramatic turn if I remove possibility of side-effects:  the above property has to be true.   Lack of side-effects makes programs behave in a very predictable way.  This is the key needed to do logical reasoning about source code correctness.

But we do need side-effects!  We need to write/read files, database records, sockets, etc.  If you think of predictability and logical reasoning as underlying goals for FP, the following are obvious conclusions:
  • Side-effects need to be somehow isolated/decoupled 
  • Side-effects need to be as explicit as possible (for example, in Haskell, compiler can distinguish between code that wants to mutate content of a file from code that wants to mutate content of a variable, both are isolated from each other).

What is wrong with OO:

Objects are meant to encapsulate both state and behavior.  This provides some level of control over state mutation but the mutating state ends up spread out all over the application. And there is typically a lot of it too. We cannot completely avoid side-effects but OO programming uses side-effects where they are not necessarily. Good example to think about is Java Beans architecture where each parameter is a state.

Objects are not a good match for formal reasoning for several reasons (one of them is the internal state) and are not very composable.
So what is the alternative that is composable and allows for formal reasoning?  The answer is: function, not the old C function, but the even older mathematical function.

FP language definition: 

I will use a very strict definition of FP.  I will call language functional if programming in it is done using functions and the language can guarantee no side-effects (at least on parts of the application code).

This definition makes FP language code behave like symbolic calculations in math.  If x=1 and y=2 then nobody expects x or y to change because I wrote equation f(x) = g(y).  Functions behave like mathematical functions - hence the term Functional Programming.  And math is ... yeah, the science in charge of logical correctness.

For FP language to be useful we need the ability to create side-effects.  I simply assume that the language has the ability to clearly isolate such code. Anything without side-effects will be called pure and anything with side-effects will be called impure, and I assume the language to have a way to force separation between pure code and impure code.  Ability to do logical reasoning will be limited in the impure part.

There is currently only one commercially available language satisfying this definition:  Haskell.   To make room for the likes of Erlang, Clojure, Scala some people try to relax what FP language means (see FP style section of this post).
Side Note: Here is a cool thing to consider.  Functions need arguments.  To be able to write f(x) we need a concept of a variable x.  We can think about x as a special 'constant' function.  In FP language 'everything' is a function!  Well, almost everything.

Bug free proven code:

I will show that the '2nd functor law' property (Example 3) is true based on how map and function composition are implemented. I will do that purely by formal reasoning.

This is how map is coded in Haskell (I have omitted the type declaration, but this is full implementation code):
map _ []     = []
map f (x:xs) = (f x):(map f xs)
If you have never seen this code, it is worth spending some time thinking about it and puzzling it out. The syntax should be very intuitive.  I use it because it is super short.  Here is quick explanation: '_' means anything (here any function); '[]' means empty list; 'x:xs' means a list with first element 'x' and tail 'xs';  'f x' means function f applied to x;  'map f xs' means map function applied to function 'f 'and list 'xs';  and parentheses are used only for logical grouping.
(... If you still have problems parsing this code:  This is a recursive implementation.  Read it like so:
line1:  for any function (_) and empty list ([ ]) result of map is an empty list ([ ]);  
line2:  for function 'f' and list starting with 'x' and tailing with 'xs' the result is a list starting with 'f(x)' and tailing with (recursion) 'map f xs'.)  
You may be thinking: Stack Overflow.   I am not going into details, but that is not a problem.  We do not need to understand why and how stack overflow is prevented to move forward with this post. (The keyword here is: laziness.)

We also need implementation code for function composition (which will be noted with '.'). Here is the full implementation code which had me break some sweat:
(f . g) x = f (g x) 
So, I want to actually prove that, for the above implementation code, the following line has to be logically true:
 map f (map g xs) '==' map f.g xs       (2FL)
for all functions f, g and any list xs.  (Side note: if would be cool to write LHS as: '(map f . map g) xs', this is called currying, and I am not going there in this post).

The Proof:
I will do that in 2 steps, first for empty list that is simple:
map f (map g []) '==' [] '==' map f.g [] 
(both equalities are true because of the first line of implementation of map).

Second, I will prove (2FL) for list (x:xs) assuming that (2FL) is true for xs (mathematical induction).
Using second line of implementation of map I have:
map f (map g (x:xs)) '==' map f (g x : map g xs) '==' f(g x): map f (map g xs) 
because of how function composition is implemented, I get this:
'==' (f.g)x : map f (map g xs) 
using inductive assumption gives me:
'==' (f.g)x : map (f.g) xs 
and again second line of implementation of map yields:
'==' map (f.g) (x:xs) 
Done!   I have proven it.   There is no logical possibility for a bug here!  This property is something we can trust to be always true.  So here we have it:  the strongest test ever written for a computer program.  If you use unit tests in your coding, think of this as a unit-proof!  Notice the power of declarative code: map implementation code is really a set of 2 'math formulas'. (Actually, you may have noticed that implementation of map is really a set of 2 properties).

So is writing proofs in the job description for future developers?  Should I be studying Bird–Meertens Formalism or something?  I would not mind if this was the case, but I do not think so.  I don't believe there is a commercially available language or a code analysis tool that can prove a simple property like this today, but tomorrow...

For very diligent readers:  Can still something go wrong with (2FL)?  I made an implicit assumption that the language itself will execute the code correctly.  The instruction sets on our CPUs are imperative and functional programming language needs to have imperative implementation layer.  So the language itself can only be 'tested' for correctness.  Despite that limitation, I hope you agree that combining math with FP code leads to something quite amazing.

Exercise (only for the brave):  Prove that Groovy's collect() implementation code satisfies Groovy equivalent of (2FL).  Let me know when you succeed ;)

Exercise for the reader:  Where did I assume side-effect free code in the proof?  Come up with list a and closures f and g in Groovy so that:
  a.collect(g).collect(f) != a.collect(f << g)

Oh, NO!  And I was so excited that I have something I KNOW and can trust is always true. This is not a bug, Groovy works here as 'expected'. Yes, the 'brave reader exercise' was a booby trap, but the journey was the important part of that exercise. There are three issues:
  • it is very hard to formally reason if code has side-effects
  • it is very hard to formally reason imperative code even if you assume no side-effects
  • with side-effects, most of the the properties that we can come up with will not be true or will need to be weakened.

FP as programming style:

Let us look at yet another example to see that side-effects mess up everything:  If I asked for a vote if the following property has to be true, I bet the overwhelming answer would be: yes it has to.
   list.findAll(predicate).every(predicate)  

Well it is not:
 def list = [1,2,3,4]
 def b = true
 def predicate = {x -> b=!b; return (b && x==2)}
 assert list.findAll(predicate).every(predicate)  //FAILS

This is something to ponder for a few moments:  We could make the above property work if we considered only predicates that have no side-effects.  But it is not possible for me to know if a Closure I use in my program has side effects or not unless I have access to the source code. This is one of the reasons why in imperative languages the term Functional Programming is used to describe something much weaker: a programming style.   Still, this style can be very powerful.  Language cannot verify that code is pure, but this knowledge can be established with code organization, naming conventions, etc.
  

QuickCheck:  Less than proving but more than traditional testing: 

There is a big synergy between writing functional code and writing unit-testable code, but OO Unit Testing is a bit different, harder to implement, and significantly weaker.  In expression 'f(x)':  'x' does not have any behavior to test, you need to test 'f,' not 'x'.  We can test properties of 'f'.  This is done by running a particular property against a large number of 'randomly' generated input.  Such 'random' generation needs to create a comprehensive set of data.  This method works very, very well in uncovering problems developers can fail to envision.  I think of this as something that gives me very high probability (very close to 1) that my code works, not just some empirical evidence that OO testing provides.

So, instead of actually proving the second functor law, we could have tried to throw lots of data at it.  That would mean lots of different lists and lots of different functions. Can functions be generated randomly?  Yes they can!  There is a fantastic test library (QuickCheck) that has been ported to various languages, but these ports are not as good as the original. There is a catch (as far as I know): you have to run the tests type by type (lists of integers, lists of doubles, lists of strings, etc).

I will, again, throw a bit of Haskell code here to see test implementation for 2'nd functor law. With properly declared 'f', 'g' and 'x', this is it: ('fmap' generalizes 'map' and works across many other types, not just collections)

... = (fmap (g . f) x) == ((fmap g . fmap f) x)
This is the full test implementation code.  Neat, is it not?
I wanted to show this code, because it demonstrates terseness at its best. Terseness that Groovy can learn a lot from.   For me the definition of readable code is: 
                    code that expresses the problem, not the solution.
This program is extremely polymorphic: this test can be ran on lists of any type (that supports ==) as well as on other functors (whatever that means).

Code that just works:  

There is this belief among Haskell developers that if the code compiles it will work as intended (http://www.haskell.org/haskellwiki/Why_Haskell_just_works).  GHC is a very, very smart compiler, but a lot of this is not Haskell specific and it is true about FP in general.
FP has this almost unreal thing going for it:  if it 'type checks', it works.  If my code is wrong - most likely the type signatures are wrong.  I have experienced this even when writing fpiglet (which is not strongly typed and compiles with Groovy so I was the 'strong' compiler).

Conclusions:

I need to stop here because the likelihood of readers continuing on is probably diminishing rapidly. If you went that far with me: Thank you!, I hope the journey was worth the effort!  If you have not seen much of FP before, you probably have a headache, but I hope it is a good headache!

I have combined some very basic Math with a simple functional program and got (I think you will agree) some amazing results.  We have actually managed a formal mathematical proof of code correctness.  The FP-Math relationship is very strong and well established,  it is one of the things that makes FP what it is.  I think of that this way:  we want bug free software and there is this couple thousand years old science dealing with logical correctness.  Seems like a no-brainer to put these together ... and FP puts these together!

What is FP for?  That has many answers, including:  bug free code, concurrency safe code,  parallel computing, performance, extreme expressiveness and composability.  The underlying goals are predictability and logical reasoning.  Bug free code is to me the strongest reason for FP. Some claim that testable code is simply a better code. If that is true than 'provable' code is simply a much, much better code.

So how do we write FP code?  Here are the common denominators:  Code needs to be declarative. Pure code (without side-effects) needs to be isolated from non-pure code (with side-effects).   Any side-effects need to be as explicit as possible.  Properties need to be identified and tested.

State without state:  Most side-effects in OO programming are caused by the need to handle internal object state. I am sure you remember this viral quote:
         'hypermedia is the engine of application state'.  
Yeah, we do not need that stinky HTTP session. Stop using it and state disappears, as do many bugs.  FP is a lot like that.  You can use partial application (curry in Groovy)  as 'an engine of state' but there are other 'engines of state' (reader or state monad, lenses, and more).
This is how you know that your code is functional:  you start using all these scary concepts some people talk about.

So is Groovy functional?  I have placed that bar too high. Here are better questions to think about: Do Groovy programs (or we as Groovy developers) :
  • isolate pure code?
  • isolate specific side-effects?
  • avoid imperative (write declarative) code?
  • use properties to define and verify behavior?
  • favor methods or closures (used as functions) in API design?  
Maybe the most important thing in FP is not the language but the programming community?  That is what makes Erlang or Scala more functional.

How functional is Grails?    Let's look at this code:
    def users = User.findAllByOffice(office1)

and analyze its side-effects from the point of view of predictability, suitability for logical reasoning, and explicitness.  Well, in addition to issuing a SELECT statement, the above code :
  • can save some changed objects to the database
  • can save some unchanged objects too  
  • will impact the data content of some records returned by queries that follow it
  • will impact some record types returned by queries that follow it (from proxies to actual objects)
  • it will return records that may be different from what is currently in the database and you have no way of finding which of them are different
What kind of properties can we assert about this code?  Not many, for example the obvious candidates:
     users.every{it.office == office1}
or (assume database enforced unique key on userName)
    users.collect{it.userName}.unique().size() == users.size()
do NOT need to be true even if your code did not modify any objects.  But I am repeating my previous posts.

Grails choice for its ORM is the strongest argument against Groovy's community claim for being FP-friendly.

In my previous post I wrote about my experience with bugs in Grails and pointed to the Spring/Hibernate technology stack underneath.   Some things in life are certain. Is 'software will have bugs' one of these things?  I hope not.  But, I do know this: software using Hibernate will have lots of bugs.  If you cannot logically reason about your code, your code will have errors.

Some Cool FP References:
http://learnyouahaskell.com/ - free and excellent, funny, and easy to read book that will open your mind about FP.
http://youtu.be/52VsgyexS8Q - 'Hole Driven Development' is a bit of a toy concept but is very interesting to think about.  Safe from too haskellish concepts for the first 5 minutes.

Dear Reader.:  It may have been not a very easy reading, but I truly hope that you will think it was worth your time.  It took many evenings of adding/deleting/retyping rethinking this post.  If you finished reading it, I would really, really appreciate some feedback, note of approval or disapproval or a google + recommendation, so I know that my effort was received in one way or the other. 
After HHH-9367 experience I needed some venue to vent my frustration and writing this post provided it for me.

Edit (6/2018): I have learned things since writing this post. We can do much better than paper and pencil proofs when verifying software correctness. For example see this blog about proofs of laws, or this example in my IdrisTddNotes project Functor laws, Idris vs Haskell.   The topic of verified functional programming is big with programming languages and books devoted to it.

Next Post:  I am planning to go back to 'I don't like' series: testing.

Friday, September 12, 2014

I don't like Hibernate/Grails part 7: working on more complex project

Grails big claim to fame is productivity.  A simple CRUD application can be up and running in just a few hours of development work.  Grails makes writing small CRUD application unbelievably easy.
But small simple apps sometimes need to grow up to become bigger and complex.  How easy is to do that with Grails?  Here is my experience with how that works.

In my previous posts I tried to rigorously document framework issues that I believed are not well known by Grails community, some not know at all.  You may find some surprising things in this post too, but instead of providing exact code examples I simply 'ramble' about my experience.  I have grouped this post into sections so you can pick and choose what is of interest to you.
If you find that boring, I am going to have next installment about Unit Testing soon.

My current project:
I am not going to bore you about the details, just a very high level bits and pieces.   I need to put 'more complex' in some context.  As it often happens, my project started relatively simple and the requirements grew more and more complex.

Now, the project enjoys complex domain model with nontrivial business logic (even CRUD is complex).  Some domain objects can be shared between users from one office, some across several offices, some across regions, some across everywhere (add model relationships and this becomes very interesting).  Objects are time dependent, maintain historical information and some relationships can go across points in time (object A at time T1 is in relationship with B at time T2).  Complex uniqueness constraints are in place, with overrides for users with certain roles and certain object types.  Users can view 'deleted' objects... We do some fun algorithmic work too, like programming on graphs and trees.

All of this may sound crazy complicated but it mostly boils down to more complex business logic, need for more complex queries and more complex testing requirements. I keep asking myself:  is Hibernate simply a wrong choice for more complex apps like this?

High level design:  All logic is kept in or behind services.  Services form deep class inheritance trees and use mixins.  Controllers are thin, but use inheritance too.  Domain objects are very anemic (almost no logic).  We had used domain objects with mixins at one point but these have been refactored to use JPA style and class inheritance.  Level 2 cache is used on certain domain objects which are never updated.  GORM criteria queries are used extensively (and almost exclusively) for querying. Some direct Sql is used for performance. Processing of large imports is often split into small hibernate sessions for performance.

Groovy Magic:
I hear that commonly as a complaint,  but I do not really agree.  With improving tool support (such as IntelliJ) introspecting framework code becomes easier and easier.   I think Grails team has done phenomenal job in providing a terse interface to Spring/Hibernate stack.  My problems are typically not with the Groovy visible top but with the whole stack.  Terseness makes things look nice, but then 's..t' is a very terse word too.

Complexity of Grails/Spring/Hibernate stack:
Beast sitting on a beast sitting on a beast.  This stuff is complex.  I did not work with Spring much nor with Hibernate (lucky 'previous' me).  A lot of stuff can be configured but how safe is it? How much configuration can I change in Hibernate to not upset Spring?  How much in Spring to not upset GORM?
For example,  some time ago Spring/Hibernate used to hold on to database connection for the duration of whole HTTP request before returning it to the pool. Our app needs this behavior.  There is a legacy setting for it and which accepts the on_close value (hibernate.connection.release_mode). But it does not work, other settings need to be modified too ... up to the point where we ended up implementing Hibernate interceptors to handle some of the problems. Seem like this configuration ('on_close') is no longer supported by the stack.  Not that we found much documentation about it.  (Please comment on this thread started by Tim and Chris if you have more knowledge about this:  http://groups.google.com/forum/#!topic/grails-dev-discuss/w3MnxautR1c)

There are some problems reported in JIRA that many developers have experienced but Grails/Spring project developers cannot reproduce.  I imagine there are many that do not get reported at all. One example is the notorious problem with hot reloading of code changes to services.  (http://jira.spring.io/browse/SPR-4153). We are seeing this problem.  Other developers that I know are seeing this too.  Reproducing this behavior on a freshly created Grails app seems impossible.  Tim was determined to figure out how to reproduce it 'from scratch' but so far, no luck.

Framework Bugs:
The prize for the most interesting bug ever goes to Chris:  If controller tries to render JSON on detached criteria query (instead of results of that query) the source code is whipped out. Yes, you are reading this correctly, invoking controller method ends up with removing your code. Thank GOD for version control!    Chris has entered bug for it into Grails JIRA, sadly it was never assigned or acknowledged in any way  http://jira.grails.org/browse/GRAILS-9169.   I have not tried to reproduce it so I do not know if it is still a problem. 
Side Note: Groovy/Grails is in a very good company here!  Once upon a time, a similarly embarrassing bug has happened with GHC (Glasgow Haskell Compiler) which would have erased the source if it did not like some code.
As entertaining (and embarrassing) as this problem was/is, I consider it not as big a deal as, say, some Hibernate 'features'.  The scariest to me is code where adding a not relevant functionality (such as an isolated database query) can cause the code to break or behave differently.  With Hibernate this is not even considered a bug, it is how Hibernate works.

None of the problems that I have discussed in my previous posts have been exclusive to complex apps. I guess, there must be some law which makes the relationship between project complexity and exposure to framework bugs go exponential.  We are seeing a lot of problems now, much more than when the app was 'young and naive'.

Keeping up with Grails releases:
Historically, upgrades had been painful.  Here are some examples of things that got us in trouble: mixins, sending JSON from the client,  ivy, having both a unit and an integration tests.  The project started on  2.0.1.  Release 2.1 had some bugs which where a no-go for us so we waited.  Eventually we decided to bite the bullet and upgrade to 2.3.2.   The cost of that upgrade was somewhere in the 2 developer week range!   Updating to 2.3.3 was easy, but 2.3.4 broke our code again and we decided to wait ...
We are currently in process of upgrading to 2.4.3... and this time it looks like a much easier process, done in about 2-3 developer days.  I hope this means that Grails project is stating to stabilize and next upgrades will get easier and easier.

Update process is important,  just the improvements in Groovy compiler are really making it worth the high cost... but the cost is high.

Testability:
For reasons that should be apparent from my previous posts, we have reversed the typical testing pyramid:  We have a lot of functional tests,  many integration tests and no unit tests.  Out 'atomic'/unit testing is implemented in Grails Integration Tests. This differentiates my current project form other Grails projects and may explain, in part, why we are seeing problems that others are not.

Can popularity of Hibernate be explained with how developers approach the testing process?  Can Grails bugs be explained by how Grails project tests itself?  Can unit tests be blamed for all of that?

This approach has a drawback:  out tests are slow to run, which impacts the overall productivity.
I will write about testing in Grails in a separate post (or posts).

Performance:
Fortunately, my project has no high scalability requirements. My problems are more related to the need for our custom graph and tree algorithms to work fast.   Grails and Groovy are working on improving performance, which is great. The biggest headache for me is the cost of the call stack.  Basically the cost of method or closure call in Groovy is much higher than in Java.  
My take on this is that some performance critical parts of the code need to be implemented directly in Java.  Hopefully there will be very few of these.

Concurrency:
Concurrency is not something exclusive to bigger projects.  Any web app needs to consider it.
My previous posts have identified some concurrency issues in GORM.  I consider 'repeatable finder' a serious Hibernate design flaw.  I expect the framework to help me with concurrency, not to make things worse by throwing on me code ill designed for handling it.   The worst case is where concurrency problem is thrown at me and I have no way of solving it.   (Again, these issues appear to be inherited from Spring/Hibernate technology stack.)

We have some functional and integration tests for concurrency, but not that many.  What we done differently is:  we used a lot of 'withNewSession' logic in our tests (which is a good thing).  We also have spent some time simply thinking about what are the potential logical problems of how Hibernate session works.    That has exposed two interesting issues (documented in previous posts) that probably nobody else knew about.

It maybe interesting to compare Grails others frameworks.  Ruby on Rails 'share nothing' philosophy offloads handling of complex concurrency issues to the database engine.  The main motivating factors behind 'share nothing', as I understand, are scalability and concurrency.  Active Record is session-less.   Play/Ebean is like this too.  But Play started to move to JPA/Hibernate.  That is a very surprising move for me.   I always viewed Scala community to be more FP-oriented. 

Community:
Being able to simply google to find answers is priceless.  Grails documentation is growing too (and has hard time staying current).  However we rarely had any lack with posting questions and getting answers.  The silence about 'repeatable finder problem' which I have posted in 2 JIRAs,  stack overflow and Grails forum is a good example.  Maybe other people just do not see these problems often enough to notice?  It seems like once we passed certain level of complexity in our project we are sailing these waters on our own.

This maybe less true than it was before, but it seems like people get in trouble when they criticize Hibernate.  .. and resolution of such criticism is to label the critic ignorant or stupid.  That really amazes me. So there is this library where just adding a new query can change the behavior or break your previous code and you cannot criticize that?

Here is a conversation between two fictional programmers A and B (it resembles some I have seen on the internet when searching for answers to my Hibernate questions):
A:  I am finding how Hibernate works hard to swallow,  my application is very hard to maintain.
B:  I have used Hibernate in 4 projects and never had a problem,  the problem maybe you.
.. and the discussion stops here

I am very puzzled by all of this. What does B differently than I do?  Less complex projects - is that it?

How did that ever work?
Did you ever experience this?  I am looking at some code, typically a test that is failing and scratching my head how did that ever work?  I have experienced situations where, for example, a test had a local variable declared and defined in one closure that was used in another closure.   This test has been passing for months on developer computers and on Jenkins...  I made some not relevant change and it started to fail because of not defined variable error...

The only logical explanation I can come up with is some incremental compilation issue that is not resolved by cleaning.  That, or .. I am crazy.


Clean, clean-all, super-clean, delete folders  ...
And then sometimes nothing works until you clean, super-clean, restart IntelliJ,  reboot...   Again this is a complex technology stack.

IntelliJ:
It is improving, syntax detection works better and better. Ability to drill into source code is improving.  Still there are some glitches but considering how complex this must be, it is impressive.
I have used STS in the past but not for some time, so I cannot comment on STS.

Overall Productivity:
As a developer,  I much rather focus on complex business functionality than on figuring out Spring or Hibernate gotchas or concurrency issues.   This is a hard project management problem.  How can we estimate cost of new functionality if resolving gotchas takes significant chunk of our time? What gotchas?  Please look at part1 - part 6 posts on this blog site.

Not being able to trust in unit or integration tests is another big problem.  Functional tests have many benefits, but these tests are slow to run and running them after each code change is not very feasible.  

It would be good to figure out some stats for what percentage of time is spent on dealing with framework related issues. I have no such stats but it feels like Grails/Hibernate does not scale well to more complex projects...  but then, would I prefer to use Spring/Hibernate directly? NO! NO! NO!

Maybe we are seeing more of Spring/Hibernate issues because Grails 'overlay' allows us to spot problems that have been previously overlooked by developers bogged down with writing verbose Java code?   'No good deed ever goes unpunished' , am I blaming Grails framework for making things better?  Quite possibly, I do.

Finally, I think a big lesson from all of this is maybe that Hibernate and 'outside of the box' thinking do not mix.  Keeping things orthodox and simple may help reduce the volume of issues that I and you will see.   My previous projects have been very vanilla like,  I have seen some problems but not enough to write home (or blog) about.  True, I knew less back then on what can go wrong, so there is some ostrichism in this advice, but the likelihood of things going wrong appears to be significantly smaller when things are kept simple.

Next Posts:
I want to take a short break from Grails/Hibernate 'bashing' and write about what absolutely fascinates me:  My dream is to have ability to write code that has no bugs at all. Is that even possible?  (I you read my "I don't like" series, you probably can see why that is my dream.... and yes it is possible!)
I also want to discuss this questions: Is Groovy 'functional', how functional is Grails?

After that: I will talk about challenges and problems I see with testing in Grails.

Thursday, September 4, 2014

I don't like Hibernate/Grails part 6, how to save objects using refresh()


This one took some time to figure out, but the closer I got to unraveling the problem, the more my jaw dropped.  I have seen mysterious errors in Geb/Spock tests, the errors where not hard to fix but the behavior was very scary.  Tests have been trying  to save a domain object (and failing to save it) - but that object was not modified by the test nor there was any explicit logic to save it!

Eventually, I was able to reproduce it 'from scratch'  (using current Grails 2.4.3 and 2.3.3 project with all new project defaults).  I use this simple domain class:
   class Simple {
      String name
   }


and I assume I have a record for it with name 'simple1' saved in the database. Consider this code:

   def simple1 = Simple.findByName('simple1') //(1)

   Simple.withNewSession {
     simple1.refresh() //(2)
   }

   Simple.findAll() //(3)


If simple1 is concurrently modified, GORM will actually attempt to save simple1 on the line marked with (3).  If this happens, the above code will fail with HibernateOptimisticLockingFailureException because Simple class has implicit version property.

Side Note:  I think, developers who configure 'version false' on their domain objects are indeed very brave!

Please note that the above code has NOT modified simple1 in any way. What happens here: line marked as (2) sets dirty flag on simple1 object!  This should be somewhat embarrassing bug because developers may use refresh() to reset the object content and to prevent any saving from happening. In reality, the very use of refresh() can actually trigger the save!

Here is a test method which demonstrates the issue (this test passes in current latest version 2.4.3 and in 2.3.3): 

void testNewSessionWithRefresh() {
  def simple = Simple.findByName('simple1')

  //start concurrent mod
  def session = sessionFactory.currentSession
  Sql sql = new Sql(session.connection())
  sql.execute("""

     update simple
     set version=version + 1, name='new_' || name
  """.toString()
  )

  def rec = sql.firstRow(

     "select name from simple where id=${simple.id}".toString()
  )
  assert rec[0] == 'new_simple1'
  //end concurrent mod

  Simple.withNewSession {
      simple.refresh()
  }

  assert simple.dirty

}

Maybe it is the nature of ugly side-effects that if you combine them (auto-save + session) new ugly (and uglier) side-effects are born?

Fail Fast?:  You can argue that refresh() call should not be made in a context of different hibernate session.  Yes, but if this is the case, should I not expect an exception thrown from that refresh() call?   This is yet one of the cases, so typical to Grails/Hibernate technology stack, where bad code works fine most of the time. A lot of code in Grails is like a ticking bomb.

I have uncovered this particular problem by analyzing weird behavior in my Geb functional tests. Geb code triggers server activity,  which is concurrent to Geb test itself.  How can I be sure that all my code is safe from this problem?  Should I avoid using more then one hibernate session per, say, one HTTP request?  Yes, that would go well with Grails/Spring design in general but there are situations where having more than one session is needed (as discussed in my post part 3).

Exercise for the reader:  Change last line in the above test method to:
   assert !simple.dirty

The test now will fail.  Fix the test so it passes by inserting this line of code (and only this line) somewhere in the test: 
   Simple.findAll()

Yeah, seriously, this is one more of 'these' problems as well.

Note on Flush Mode:  As discussed in the previous post, and in http://jira.grails.org/browse/GRAILS-11687: Grails 2.4.3 Flush Mode behavior is very intermittent. Line (3) in the above code snapshot fails with Grails 2.4.3 so, in this case, GORM decided to use FlushMode auto-like behavior.  Examples from the previous post had GORM behave in manual-like fashion with Grails 2.4.3.  The flush mode configuration in both examples is the same and if you decided to print the value on the current session it would print 'AUTO'.

Testing for such problems:  Concurrency problems are hard to test for.  I expect these to be handled for me by the framework as much as possible.  Complex concurrency issues maybe a consequence of business requirements that my code is trying to implement,  I can handle these.  But, concurrency issues should not be something the framework throws at me 'just because'.  As we have seen already (in post part 2) concurrency is not something Grails does well.  

This is simply one more example of a not testable problem.
Unit tests are a non-starter here, I may have some luck uncovering these with integration or functional tests if I know where to look.  But this is not very likely.

References:
http://jira.grails.org/browse/GRAILS-11701

Next Post:  My plan is to write one post about what it is like to work on a more complex project that uses Grails.   Something that goes beyond a simple CRUD application.  Instead of providing concrete code examples, I will simply describe some difficulties my team has encountered.   After that I want to come back to the discussion started in part 1: testing in Grails.