Opinions and Programming: I don't like Hibernate/Grails part 9: Testable code

"I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence". Kent Beck.
Finding more problems with less effort is something I totally agree with. Which approach to testing will give me the most bang for the buck? I like to think about tests in a pragmatic way: I write tests to find bugs and guard against bugs. I consider manual testing rather ineffective and, in most cases, inferior when compared to automated test.

Testing seems to be deeply related to my last two posts. It is obviously related to software correctness. Less obviously: out approach to testing impacts how we perceive the framework, how we test can explain why we like or don't like Hibernate and Grails.

I have moved away from unit test in Grails. My approach to testing is an 'inverted pyramid'. I test internal implementation details using Grails integration tests and I write a lot of functional tests. My test are less 'unit' than you may like them to be (they interact with actual database, mocking is replaced with data setups) but are closer to the reality. This post explains my reasoning behind this decision.

I am considering the unit/integration/functional division from the 'testing philosophy' point of view. I care less if tests are placed in test/unit, test/integration or test/functional folder. I will stay away from the hate-love TDD debate.

Testing choices: (Just to make sure we are on the same page.)
Testing spectrum has these 2 extremes with very different characteristics:
(1) Unit Tests: testing expected behavior; testing in a fake environment; testing internals not visible to the end users
Also: white box testing; bottom-up testing
I will test parts (units) of my software in isolation from anything else. I will 'mock' interaction with the rest of the system. Since I know exactly what can go wrong, I can write mocks that exercise the tested part under a specific situation.
(The mocking aspect is specific to OOP, nobody writes mocks in FP... with that said, there is this great book: http://en.wikipedia.org/wiki/To_Mock_a_Mockingbird ;))

(2) Functional Tests: testing for unexpected problems; testing in a real environment; testing functionality visible to the end users
Also: black (more gray than black - I need testing 'hooks') box testing; top-down testing
I will test my application as a whole. I identify a list of specifications and I write tests to verify that my application works correctly with respect to these specifications. I do not need to know everything that can go wrong, I assume that since I covered a comprehensive range of data and scenarios representing actual software usage, most of what can go wrong will be uncovered.

Integration tests fall between 1 and 2.

In this post I argue that testing in Grails needs to focus on unexpected problems, and should be done in a real (or close to real) environment. That moves the testing away from unit and towards integration and functional.

Why Unit Tests are Great:

execution speed: no need to startup the whole infrastructure, etc
good coupling: the tested unit becomes married to the unit test, this can be a happy, strong marriage that is going to survive ups and downs of refactoring.
atomic/unit nature: the idea is to make a perfect whole from perfect units. (Except, that principle is logically flawed in OOP.) This is sometimes called bottom-up testing.
unit tests can aid software design and coding.
aggressive conditions: once you know what kind of problems to expect you can stress the code 'more aggressively' introducing scenarios which are rare or complex to setup using integration of functional tests. This is rarely done.

This post is not a criticism of Unit Tests. It is a criticism of how they are used.

Where is the most bang?
Some people disagree with me when I say: What really needs testing is side-effects. Errors caused by side-effects are the hardest to troubleshoot and fix and are often very intermittent. Side-effects limit developer ability to do logical reasoning on the code. Side-effects can be very confusing.

There are many books and articles about testing and side-effects have not been discussed much. Why is that?

Case study: This example is repeated from my last post:

def users = User.findAllByOffice(office1) //code (A)

Assume user has userName (with unique constraint), office (of type Office) and userPreferences (of type UserPreferences). This code:

will issue a SELECT statement (with whatever locks)
can save some changed objects to the database (part 5)
can save some unchanged objects too (part 6)
will impact some record types returned by queries that follow it (from proxies to actual objects). Some records returned from this query will use Hibernate Proxy to implement preferences and some will use the actual UserPreferences class. (Some developers may be surprised that the same goes for the office association). (part 4)
will impact the data content of some records returned by queries that follow it. Similarly, the content of some records returned from this query may be different from the data returned by the underlying SELECT statement. (part 2)

Can simply adding line (A) break my code? Clearly it can!

Side Note 1: Why are side-effects confusing: There is this theory that human memory and reasoning work by 'chunking'. The idea is that the human brain stores knowledge in chunks. Each chunk gets a label which works something like a DB index. Human brain can recall the whole chunk using that label. There is this very prominent chunk we have all formed: SQL SELECT query. When you look at code (A) you inadvertently call for that chunk. Yet 4 out of 5 side-effects associated with that code have nothing to do with 'SELECT' chunk your brain has just found.

Side Note 2: Why are side-effects not logical: Maybe a better term would be: not logic-friendly. I have dedicated large parts of this post to explaining why, but I will add a quick high-level explanation from a slightly different angle. You can safely skip this side-note if you are allergic to academic CS.

Logic needs connectives, to start with, it needs conjunction ('and', '^'). Logical conjunction follows this reduction rule:
if A is true 'and' B is true then B is true.
You can think of this as one of the axioms. Logic cannot start without it. The programming equivalent of this is the 'beta reduction' rule which looks more or less like this:
second (A, B) '==' B

and means: if I do computations A and B (compute a pair) and then ignore A, then this should be equivalent to just computing B. This is obviously not true if A has side-effects impacting computation B! Side-effects inflict a mortal wound to the (straightforward) correspondence between logic and programming.

The above paragraph may as well have been copied from first pages of an introductory Type Theory book. Keeping side-effects on a tight leash allows to recover correspondence to logic, but, this is no longer first pages (google: 'Hoare Type Theory'). A comment by Mark, on my software correctness post, used the term 'effect' (contrast it with the more unruly 'side-effect', google: 'lax logic', 'monads in lax logic' or 'Effect System').
With unruly 'anything goes' side-effects logical reasoning on a program becomes very, very complex and much less potent, Type Theory-Logic correspondence is lost. I am trying to learn this stuff but even at my current newbe level: it is eye opening.

If you decided to skip the last side-note this is a good point to return:
Welcome back! You can ignore all of this academic mumbo-jumbo. Just be aware that whatever voodoo your brain does to form judgments about your GORM code (or any other code with unruly side-effects) should be a suspect. This voodoo is far from any straightforward logic and it is easy to get things wrong. Hopefully, I managed to scientifically convince you that:

Side-effects need extra attention when testing
This will not be: 'testing expected behavior'

Here is a more pragmatic argument: As I explained in my last post: side-effects 2-5 are a no-show for very simple CRUD applications and are not that frequent for simple CRUD apps. The question is: are you satisfied with your app if it works 95% of the time?
The point is: complex Grails apps or apps that strive for more than 95% correctness need to make serious attempt to test side-effects 2-5.

Unit Test Ostrichism
Keep your nose out of trouble, and no trouble'll come to you. - The Lord of the Rings movie

I have been scratching my head asking: why my current project is finding so many issues that nobody else has reported. One possible answer is: everyone else has been unit testing and 'assuming away' the reality. If that is true, then maybe everyone else has also decided that Hibernate is OK.

Here is a question for you: When writing unit tests for code that includes GORM queries (assume something similar to code (A)), do you use mocks to verify that:

the query does not save any objects?
your code works correctly even if the query returns User object with office2 != office1?
your code works correctly even if the query returns objects that violate DB enforced constraints (such as more than one user with the same userName)?
your code works correctly not only on actual domain objects but also on hibernate proxies?

I do not. I decided that the amount of work needed to write tests like this would be prohibitive. But if you disagree please drop me a comment!

Here is another question: The above bullet list spells out some impacts of side-effects 2-5. This list is not complete. Can you think of other examples?
I know that I do not know all impacts. I am not even sure if I understand all GORM side-effects. Testing for expected problems almost guarantees that I will not find what I do not know.

Code with non-localized impacts is not testable.
OO code needs decoupling. In a well written OO code, state mutation in object A does not impact other objects. OO software designed without decoupling is not testable. There is a technical term for it: spaghetti.

Almost every post in my 'I don't like' series has shown an example of GORM code where the behavior changes (even breaks) when an isolated GORM query is added to or removed from the code. Hibernate queries create impacts that cannot be easily localized to one or few classes.

How can GORM and Hibernate defend their design as testable?

Fail Fast and Test Easy:
This is something very much missing in Grails. Very often an incorrect code is likely to work just fine 50% maybe even 80% of the time. Data changes or a query is added or removed somewhere and things break.
Most of my posts in this series pointed out examples like this (see part 3 and part 6). Ideally, incorrect code should either fail to compile or should fail during my first attempt to execute it. This is often not that easy to accomplish with a languages like Java or Groovy, but there is just no excuse for, for example, allowing me to use GORM object obtained using hibernate session S1 within session S2. If such code fails intermittently it should ALWAYS fail.

Without fail-fast philosophy unit tests are not worth much.

Unit Testing and FP:
"Writing unit tests is reinventing functional programming in non-functional languages" (Christian Sunesson - on github). (I found the linked post a very interesting reading).
Here is a mental exercise: when reading the following blog about 'testable code': http://googletesting.blogspot.in/2008/08/by-miko-hevery-so-you-decided-to.html , think how each of the guidelines relates to FP. Notice they are all N/A!
'Testable code' term is OOP self admitting to its limitations (my OO code is not testable unless I follow these list of rules...). FP code does not need any tweaking or special guidelines to be testable.

In many respects Hibernate design is the opposite of FP. If one is very unit testable the other one probably is the opposite of testable.

Not just Hibernate:
Grails/GORM/Hibernate stack is complex and has some complex bugs. So does Groovy. I know that lots of my app functionality will break next time I upgrade Grails. Is testing for 'expected' problems a the best investment in this environment?

Conclusions:
Grails contradicts itself on unit testing: Grails framework provides rich tools to unit test domain objects, only these tests will be rather useless. I think, this confusion is not something that Grails has introduced. Java community in general does not perceive side-effects as something to worry much about. Unfortunately, perception != reality.

More on the confusion: it should be clear by know that the blame for a lot of this resides with how Hibernate session works. Non-localized side-effects are really hard to test. With that stated, I would expect each JUnit integration tests in Grails to run in its own session. The very idea of several tests sharing the same Hibernate session is repulsive. Take a look that these JIRA tickets: GRAILS-11644, GRAILS-11706.

I find Grails attitude towards testing insanely confusing.

Testable Code
Testable code should be defined as: a code that makes unit tests effective in bug prevention.
This statement is not something everyone agrees with. Unit tests are often viewed as a way to aid the design and coding process (TDD) and this becomes their main purpose in life.
"Unit testing is not about finding bugs", see Writing Great Unit Tests. Don't you agree that something is very wrong with this sentence? Instead of saying "unit testing is not about finding bugs", maybe we should rethink how we write the tested code so it is? How about: start paying attention to the side-effects?

If you are doing something over and over and it does not work you have 3 choices:

keep on doing it and expect it will work (called insanity, also know as Grails approach to unit testing)
keep doing it and null out your expectations ("unit testing is not about finding bugs")
change the way you do it (redesign the tested code or/and the way you test).

I am suggesting the 3rd bullet is the way to go. I cannot redesign Grails but I can rethink how I test.

Functional Test Testable
I like Geb. Writing good functional tests is not trivial and, like with unit tests, it impacts the design of the tested code. The term 'testable code' needs to be extended to functional tests. ... it maybe an idea for a long post somewhere in the future.

Next Post:

I want to start wrapping up my 'I don't like' series. In my next post, I plan to give an update about the repeatable finder issue (post 2) and include a few final thoughts.

Opinions and Programming

Friday, October 10, 2014

I don't like Hibernate/Grails part 9: Testable code

1 comment:

Labels

Blog Archive