Stuff about programming, programming style, maintainability, testability. Dedicated to my coworkers and friends. Everyone is welcome to leave comments or disagree with me.

Friday, August 15, 2014

I don't like Grails/Hibernate, part 4. Hibernate proxy objects.

Hibernate uses proxy classes to implement its lazy loading of nested domain objects. As a result, instead of BankAccount class (defined in post 1)  I may sometimes get object with a class named something like 'BankAccount_$$_javassist_27'.

The idea behind the general concept of a proxy is that: use of proxy objects should be identical to the use of real object. This utopian wish is impossible to accomplish in practice, and Hibernate proxies are no exception.  Hibernate proxies can cause several problems, including
  • gotchas related to equals method implementation (not discussed), 
  • the famous LazyInitializationException (which I have promised not to discuss) 
  • casting problems caused by use of polymorphic associations (this is a well documented issue and I will stay away from it as well).
So, the next best thing proxy designers (in any framework) can do is to make sure that the use of their proxies is not very transparent to the developer.  Good examples to think about here are proxies that show up in remoting technologies like: the old CORBA, RMI, EJB or even SOAP based WebService (brrrr).  In these technologies, developer had to do some specific thing, like a JNDI namespace lookup, to get access to a proxy.  By contrast, in Grails/GORM/Hibernate it is very hard to maintain a mental map of which object is a real object and which is a proxy.  This causes very intermittent behavior.

Probably for that reason, proxies had a rough start in Grails.  The term 'proxy hell' was coined and some interesting bugs like this one:  http://jira.grails.org/browse/GRAILS-2570 have been reported.

Code Examples:
This code will get me a proxy object
  def id
  BankAccount.withNewSession {
     id = BankAccount.findByName('a1').id
  }
  def a1 = BankAccount.load(id)
  assert simple instanceof HibernateProxy


Note, that as usual with Hibernate, I can break the above code by adding an isolated query:
  BankAccount.findByName('a1')  //added this line

  def id
  BankAccount.withNewSession {
     id = BankAccount.findByName('a1').id
  }
  def a1 = BankAccount.load(id)
  assert simple instanceof HibernateProxy //now fails


You may find this last code example somewhat unrealistic, but I hope you agree with me that it explains the intermittent aspect of how proxies work (or why they often do not work).

How proxy code typically breaks:
Sooner or later, each Grails app will need to use domain object class to do various things with it like:
  • check if an object is a domain object or something else
  • introspect GORM properties on a domain object
  • find beans named after a domain object (for example, find BankAccountService if object is BankAccount)
And that is great, only, it is soooo easy to forget that, for example,    
    entity.getClass().shortName

may give me wrong class name! And if I do forget, my code will work, only, not always.

If you are very diligent in making sure that you always use the un-proxied classes only, then rest assured that someone else is not that diligent.  For example, here is a currently open Grails JIRA: http://jira.grails.org/browse/GRAILS-11630 and if you search source of, for example, various template libraries available out there you will find things like this code:
  grailsApplication.getArtefact(
       DomainClassArtefactHandler.TYPE, 
       element.getClass().name
 )

which, again, I am sure was tested and works ... only not always.

How to test for proxy problems:
This may sound like a broken record: Grails Unit Test will not find these.
Integration tests may find them but few developers code integration test separating these essential test elements into separate Hibernate sessions:
  • data setup (given)
  • test itself   (when)
  • test asserts (then)
(that lack of session isolation applies to Grails framework itself as shown by this open bug: http://jira.grails.org/browse/GRAILS-11644).  So Grails integration tests, as they are typically coded today, may not find most of proxy related issues. Functional tests are the best bet again.

Next topic: 
So far I have focused a class of Hibernate side-effects, where:
  BankAccount.findByName('a1') //(1)
  someOtherCode()              //(2)

adding isolated query (1) changes behavior of code (2).

I want to continue discussion about side-effects, but from a somewhat different angle. In post 1, I have promised not to talk about auto-saving, I will break that promise.  There are some very 'interesting' aspects of auto-saving worth examining,  for example, I sometimes get DB update operation errors from issuing a finder query.  I will try to examine a few of these interesting things.  Things that happen as a consequence of Hibernate design decision to make a major side-effect, like SQL update or insert operation, implicit and invisible.

Monday, August 11, 2014

I don't like Grails/Hibernate part 3. DuplicateKeyException: Catch it if you can.

This post follows a pattern I used so far:  it documents a case where adding an isolated query can break Grails/Hibernate code. 

I often think that a measure of well designed library is how in how it handles exceptional cases.  Hibernate does not 'exception' well, but Hibernate behind Spring Framework Templates, and then behind GORM can be really puzzling.  In this post, I will examine one of such puzzling cases.

Continuing my previous post, which described a very scary 'repeatable finder' concurrency issue. There are no good solutions to that problem and identifying affected application areas seems close to impossible.  However, repeatable finder problem can be partially solved by using shorter, more atomic Hibernate sessions.

Call for short Hibernate Sessions:
By design, Hibernate sessions are implicit in Grails and it is the framework responsibility to manage hibernate session life cycle when processing HTTP requests. Taking over that role does not seem to be a good idea.  With that said, since there are no good solutions to the 'repeatable finder' problem, bad solution maybe still the best I got!  Also, there are other, better acknowledged, reasons why I may need to control hibernate sessions, such as performance, long running scheduled jobs, Grails integration tests.

I find the whole issue a bit ironic.  Web developers have been, by now, conditioned to minimize the use of HTTP session.  REST wants to ban it all together.  Yet, Hibernate design is to maximize the use of hibernate sessions. Both are shared application state, if one is bad so should be the other!   Is this the idea: that shorter lived evil is less evil so it is OK to use it for everything?  If that is so, here you have it, one more argument for making hibernate session shorter.

There is an API to interact with hibernate sessions.  I can use sessionFactory bean directly to flush/close/create sessions or I can use withNewSession method available on any GORM domain object.  

Unfortunately, dealing with more than one Hibernate session exposes me to a bunch of Hibernate/Spring exceptions that would be otherwise unknown to me as a Grails developer:  HibernateSystemException, NonUniqueObjectException (hibernate), and DuplicateKeyException (spring) are among them. I will focus on the last 2.

NonUniqueObjectException (hibernate), and its twin DuplicateKeyException (spring):
In my experience so far, they seem to be linked to each other (DuplicateKeyException wraps hibernate NonUniqueObjectException).  I had hard time finding good documentation about these two, documentation that is relevant to how Grails works.  Hibernate JavaDoc for NonUniqueObjectException gives me only this: 

"This exception is thrown when an operation would break session-scoped identity. This occurs if the user tries to associate two different instances of the same Java class with a particular identifier, in the scope of a single Session." (http://docs.jboss.org/hibernate/core/3.6/javadocs/org/hibernate/NonUniqueObjectException.html)

This is not something I, as a Grails developer, want to identify with.  Instead, I would prefer the framework to enforce that objects returned using query in one session are not used in other session. But that is not exactly what the error indicates or not what it is.
Please note that Hibernate is not very logical here either:  it is not exactly that the user always  'associates' instances with the session.   They can get associated sometimes in ways that would surprise most of the users! (I may need to post about it too.)  Hibernate does not provide any public API to query for what is associated with the session.  It considers this 'private' information.  Well, if it is so private that I can't even query for it, why am I seeing it then in the exception?
DuplicateKeyException documentation seems simply incorrect for the context in which I am seeing this error:

"Exception thrown when an attempt to insert or update data results in violation of an primary key or unique constraint. Note that this is not necessarily a purely relational concept; unique primary keys are required by most database types." (http://docs.spring.io/spring/docs/3.0.x/api/org/springframework/dao/DuplicateKeyException.html)

Fortunately, the message I typically get is more descriptive: "a different object with the same identifier value was already associated with the session".   So both documentation and exception design seems to be a mess here, but the real mess is still ahead of us.

Code Examples:
As a Grails developer, you may find it surprising that this code even works:
        def ac1 = BankAccount.findByName('a1')

        BankAccount.withNewSession { session2->
            ac1.name = 'a1b'
            ac1.save(flush:true, failOnError: true)
        }

it is better to see the problem if I make the code more Hibernate explicit (which still works just fine):
        def ac1 = BankAccount.findByName('a1')

        BankAccount.withNewSession { session2->
            ac1.name = 'a1b'
            session2.saveOrUpdate(ac1)
        }

If ac1 is associated with my first session and not session2, why session2 allows me to save it?  Would it be not more logical if this code threw an exception with something like 'not associated with session'? This may make sense for more general case that Hibernate tries to accommodate, but it makes Grails behave inconsistently. 
Now, I can break it by adding a finder:        
        def ac1 = BankAccount.findByName('a1')

        BankAccount.withNewSession {
            BankAccount.findByName('a1') //added this line
            ac1.name = 'a1b'
            println shouldFail(org.springframework.dao.DuplicateKeyException) {
                ac1.save(flush: true, failOnError: true)
            }
        }

or to be more Hibernate explicit:
        def ac1 = BankAccount.findByName('a1')
        def id = ac1.id

        BankAccount.withNewSession { session->
            session.get(BankAccount, id)
            ac1.name = 'a1b'
            println shouldFail(org.hibernate.NonUniqueObjectException) {
                session.update(ac1)
            }
        }

And the names find/get sound so innocent ...  Imagine running a diff, comparing what changed from last stable source code version to figure out what caused the problem: and finding only extra finder methods!

Again, one sane way to think of this issue is that I am using ac1 associated with session1 on a wrong session (session2) and that is wrong.  But if that is the case WHY does my first example work!  

SIDE NOTE:  In my experience, this is not the only way to get into DuplicateKeyException trouble and I have not figured out all Grails code triggers for it. In most cases, I was able to solve the problem by 'bringing' some domain object into the current session. So the mechanics of the problem seem to be always on some level similar to what I have described. 

How to test for these?
Grails unit test coverage will be useless for finding DuplicateKeyException/NonUniqueObjectException.
Both integration and Functional tests are capable of finding this issue.

Why Grails has done it this way?
From what I know, GORM tries to be a thin Groovy layer around Spring Hibernate Templates. In addition, Hibernate does not expose any public API to query what domain objects have been attached to the session so GORM would have to remeber that.  One solution could be for GORM to store 'owning' session on each domain class created by Grails and use it to provide more meaningful and consistent exception if client code tries to use it in a context of another session.

Refrences:

Summarizing examples shown so far:
In its ORM pursuit, Hibernate has lost something much more fundamental and infinity more important than purist ORM thing can possibly be.  Ability to manage unwanted side effects has been lost and, as we have seen in a couple of examples already.  In code like this:

   BankAccount.findByName('a1') //(1)
   someOtherCode()              //(2)

(1) can change behavior of (2), in most extreme case it can break it. 

As a result, ability to decouple application logic is largely lost if I use Grails/Hibernate stack.  I consider this a major Hibernate design flaw, but because GORM Domain Objects are likely to be used extensively in Grails apps, Grails applications are more impacted by it.  

Also note that problems like this maybe very hard to troubleshoot.  Even if I somehow manage to have a mental image of every finder, every eagerly loaded association, Grails/Hibernate can (and will) put objects in the cache that will surprise anybody. (I may write about it too).

In my next post I will examine the same pattern (adding isolated query breaks Grails code) in a context of Hibernate proxies and talk about another related Grails bug.

Sunday, August 3, 2014

I don't like Hibernate/Grails, part 2, repeatable finder problem: trust in nothing!

I was hoping for some 'inalienable truths' developer can rely on.... Like, that things that should obviously return true never return false.
(A more correct technical term for this is property but I find inalienable truth more fun).

The assert code from last post is one such example:
    assert BankAccount.findAllByBranch(myBranch).every{
    it.branch == myBranch
  }

Repeatable finders:
(I use this term as a reverse analogy to non repeatable reads.)  In Hibernate/GORM the above assertion does not need to be true.  For each finder/getter hibernate stores returned objects in its session and when next finder/getter is called hibernate will use the stored objects whenever it can.  It will not refresh them.   So you have to assume that any finder will return some (or many) of the domain objects found by previous finders.  What if something has happened that hibernate session does not know about between the time of your current finder and the time previous finders ran?

So here is one example showing how to break the above BankBranch assert:

(assume ac1.branch == branch2)

... HTTP request for User1:
... 'componentA' executes:
println 'Tracing something about '+BankAccount.findByName('ac1')

... some other expensive computation executes

... HTTP request for User2:
def acc = BankAccount.findByName('ac1')
acc.branch = branch1
acc.save(...)

... HTTP request for User1 continues:
... 'componentB' executes:
def myBranchAccounts = BankAccount.findAllByBranch(branch1) 

(myBranchAccounts includes ac1 but Hibernate returns old, not refreshed version of it so ac1.branch == branch2 is still true)

... myBrancheAccounts are rendered on a view page

(User1 is presented a list of all accounts from branch1 including ac1 which is jolly displayed showing branch2. User1 is surprised.)

This is not necessarily a strict concurrency problem. You may have code which bypasses Hibernate (maybe uses Groovy.Sql class directly) and get into very similar issues.

It is also interesting to think about compoentA and componentB code from the point of view of the unit test coverage leak problem I described in the 'part 1' post.

Here are 2 other inalienable truths (properties) that are no longer:
Uniquness constraints:
My BankAccount was declared with unique constraint on the name field (database enforced uniqueness on the name column). So if I do this:

  def accounts = BankAccount.findAll()

I will never see the same name repeated, right?  Wrong.  Here is a concurrent usage that shows how that breaks:

...HTTP request for User1:
... 'componentA' executes:
println 'Tracing something about ' +BankAccount.findByName('ac1')

... some other expensive computation executes

...HTTP request for User2:
def ac1 = BankAccount.findByName('ac1')
def ac2 = BankAccount.findByName('ac2')
ac1.name = 'ac1_b'
ac1.save(...)
ac2.name = 'ac1'
ac2.save(...)

... HTTP request for User1 continues:
... 'componentB' executes:
def allAccounts = BankAccount.findAll() 

(allAccounts contain old amount in ac1 with ac1.name == 'ac1'
and ac2 with ac2.name == 'ac1')

... allAccounts are are rendered on a view page 

(User1 is presented a list of all accounts and account 'ac1' shows up twice. User1 is upset)

You may find it unrealistic that User 2 can perform 2 renames concurrently to a short period between 2 finder calls in one HTTP request.  What if there have been 2 users renaming one object each?  In any rate there are other possible domain objects than Bank Account and other fields that may need to have uniqueness constraint.   I think the issue is demonstrated well enough.

Again, you can get into similar problems if you use some direct Groovy.Sql.

Results which look like uncommitted reads:
If I transfer money between accounts inside a transaction I should never ever see the transfer applied to one account and not the other.  Right?  Wrong again:

...HTTP request for User1:
... 'componentA' executes:
println 'Tracing something about ' +BankAccount.findByName('ac1')

... some other expensive computation executes

... HTTP request for User2:
def ac1 = BankAccount.findByName('ac1')
def ac2 = BankAccount.findByName('ac2')
transferMoney(ac1, ac2, 1000) //transfer 1000 dollars

...HTTP request for User1 continues:
def allAccounts = BankAccount.findAll() 

(allAccounts contain old amount in ac1 but new amount in ac2)

... allAccounts are are rendered on a view page 

(User1 is presented a list of all accounts and the list looks inconsistent.  Where did the 1000 go? User1 is software tester, he/she is now furious, spends hours figuring out what got wrong and the problem magically just disappears. )

And again just use direct Groovy.Sql to get into the same issue without concurrency.

Why Hibernate Works Like This?
I imagine that it must have been tempting to use single Java object to represent single record. This is also purist approach to ORM:  node.children.first().parent.is(node). But with hibernate this may have not been just a temptation.  Hibernate designers decided at some point that Hibernate will be saving objects attached to the session automatically. I imagine that it would be very hard to deal with auto saving if you had more than one domain object representing the same record (which one would you save?)
So why not refresh existing object each time it is retrieved?  Maybe because that would be a serious side-effect ;)   If some objects have been changed locally and also concurrently changed in the database:  have Hibernate designers been concerned about throwing ConcurrentModificationException from a finder?
Well I do not see why, because Hibernate finders already save objects and you are likely to get a collection of interesting save errors when calling a finder.  (Talk about aversion to side-effects!)

Can I be just careful?
Be careful not to pollute hibernate session - that may not be so easy.  For example, consider that the BankBranch class has something like featuredAccount association back to BankAccount.  If that gets eagerly loaded at the time branch1 is retrieved hibernate session is already polluted with one BankAccount at the onset of the first example (without any artificial println statements).

More complex applications may want to do some HTTP filter before and after logic which uses hibernate objects.  Applications may have layers of complexity which share the same hibernate session.  Controlling what is and what is not in that session is unrealistic.

Why is this bad?
Developers experienced with relational databases are likely to expect certain behaviors from the code that with GORM/Hibernate are just gone.  The unexpected behavior may be very intermittent and impossible to troubleshoot.  In my example I have 'broke' the code by inserting a println statement printing something about one account. What if finder 'polluting' hibernate session was executed only under some (rare) conditions?
I think developers tend to think of finders and getters as safe methods to call.   Almost always getters do not mutate anything.  With Hibernate getter/finder have side-effects, one of them is mutated Hibernate session and this is easy to forget when you design and code your application.

I believe that any nontrivial Grails app most likely has issues/bugs related to repeatable finder.
In addition, Hibernate offers NO public API to query what is stored on the session.  So if you think of some programmatic ways to solve this issue think some serious hacking.

How to test for this:
Repeated finder problems arising from the use of direct SQL or use of several hibernate sessions within single HTTP request can be discovered by both integration and functional tests.
The issue is not unit testable even if you think of Unit Tests as 'atomic tests' which are implemented as Grails integration tests.
Unfortunately, in many cases the issue will be triggered by concurrent access to your application. 
Testing concurrency issues is always not trivial.  So I do not really know how to answer this question. 
Ideally, the tools we use will be better designed to handle concurrency (have you read Simon Peyton Jones 'Beautiful Concurrency' http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/?). I am afraid Hibernate maybe never be one of these tools.  

New Session a Solution?
There is one very tempting partial solution to this.
If you really need consistent results from a finder keep it in isolated Hibernate session.(LazyInitalizationException alert flashing, OK I promise not to talk about LazyInitalization. :)
The idea is that if this line (from first example):

def myBrachAccounts = BankAccount.findAllByBranch(branch1) 

runs in a new (and therefore unpolluted) hibernate session the domain objects returned from the finder will all have the most recent values and none of the surprising things will ever happen.
Or ...
Enter next Hibernate gotchas:  DuplicateKeyException (and friends) the subject of my next post.

My current thinking is that using new session on certain 'crucial' selects seems to be a very good option to reduce the impact of repeatable finder problem. This technique could work well, especially if the finder we want to protect is the very last hibernate call in the HTTP request. I will not solve the problem but may reduce its impact. I may return to this thought later.

I believe there is currently no full solution to repeatable finder problem described in this post.

Added 2014/09/12:  Thinking more about impacts:
I find it helpful to think about application code from the point of view of properties (logical consistency rules that need to be true).  Application may rely on such properties explicitly (for example your logic will exception or return incorrect results if name property is not unique), implicitly (it will be embarrassing to show user list of items with seemingly broken uniqueness), and can actually be enforcing such properties (for example Grails code is used to enforce some more complex uniquness rules).

There are many, many possible properties related to domain objects (we have seen only 3 in this post).  Some are derived from software requirements but many have a much lower level nature. For example, each 'where' or 'join' criteria in underlying SQL statements is likely to imply a property (as shown by BankAccount.findByBranch() example). 
Some properties are meant to be enforced by application code, some by DB, some by the framework.
Repeatable finder is likely to affect/invalidate large portion of such properties in your application!
And you cannot rely on the fact that even DB enforced property will hold.
The impact seems very big.  

(EDITED 2014/09/12) References:  
I have posted a question about this issue here: Stack Overflow Question
I have now posted one (terrible) solution to how one can identify and fix such problems as answer to the Stack Overflow Question.
Here is Grails JIRA ticket for it: http://jira.grails.org/browse/GRAILS-11645
another relevant JIRA: http://jira.grails.org/browse/GRAILS-11644
Forum:  http://groups.google.com/forum/#!topic/grails-dev-discuss/wzekMGC0ibE

Saturday, August 2, 2014

I don't like Hibernate (and Grails), PART 1

My goal here is to write about Hibernate and GORM functionality that you could call 'It is not a bug, it is a feature' and which yield very negative and often non trivial consequences.
These topics are (I believe) documented nowhere and surprise every developer I know.
I also want to comment on what are the best ways to test for these.

Testing Grails Unit vs. Integration vs. Functional:
Is there some sort of a 'pyramid' consensus that you need a lot of unit tests, many in integration test,
and maybe just some (if any) functional tests?
Grails creates appearance of a test friendly framework with big focus on unit tests.
Out of the box, Grails will put test/unit and test/integration folder in your project
(You really should have test/functional there as well, but you need to work a bit harder to get it).
For each domain class you add to your project Grails will, by default, create an empty Unit Test for it (not integration, only unit).

Grails Unit Tests do not interact with database and anything related to hibernate/GORM has to be mocked. This makes Unit Tests a wrong choice for uncovering problems related to how hibernate/GORM is used/misused in your project.
I find this a bit ironic.  Here are two well know Hibernate gotchas: automatic saving of domains objects and LazyInitializationException.   Talking about these 2 feels like beating a dead hoarse so I will not.
I just want to point out that your domain objects maybe saved when they should not be (do you have non-transactional validation logic outside of your domain class?) or your views may throw LazyInitializationException (did you forgot to hydrate your model and transaction has rolled back?) but all your Grails unit test will pass, your Grails integration tests will also pass.  Did you write Functional Tests?

You maybe tempted to think about Unit Tests as simply tests that exercise atomic blocks of your code in isolation.  If this is your thinking you maybe putting your 'Unit Tests' into tests/integration folder and still think of them as 'unit'.  I admit to thinking this way.

There is an issue with this too.  Side effects do not work well with unit tests and, well, Grails is very much a side-effect framework.  Consider this high level hypothetical: you have programmed 'component A' and have a full logical unit coverage for it (if that is even possible) then you coded 'component B' and wrote full coverage for it as well.   You pat yourself on the back for having fully covered everything.
But wait... , if executing A creates side-effects which impact logic in B, then, well, you coverage has leaked on you.  You may be confident that you wrote a well tested app, but you really did not.  I think this is not well understood because, if it was, I would expect much more interest in FP.

My take on testing for possible Hibernate gotchas is to reverse the pyramid:  Focus on Functional Tests,  write more integration than unit tests.  But I will probably address it in more details in the future.

Example:
(I may reuse this simple Domain Class in future posts):
 class BankAccount {
    String name
    BankBranch branch
    Float amount
  
    static constraints = {
        name unique: true
    }
 }

Here is my 'component B' logic: (Assume properly defined equals/hasCode methods are in place  - not shown.)

 assert BankAccount.findAllByBranch(myBranch).every{
    it.branch == myBranch
 }

Would you think the above statement is guaranteed to be true?
If you are a sane person you will answer yes, this assertion has to hold no matter what.
If you have worked with Hibernate and/or GORM long enough then you (have lost your sanity by now and) can figure out what code to add in front of the above code block to have it break.

I call it 'repeatable finder problem' and I will write more about it next time.

Wednesday, October 9, 2013

Functional Groovy: Y-Combinator. Learning Groovy from Haskell ;)

Intro

I guess that, Y-Combinator (maybe together with Monad) is now on the Geek-Should-Know concept list.  Also YCombinator is a name of a venture capital firm so even managers may know about it!

I intend this to be a very simple intro to Y-Combinator using Groovy Closures.  There are other blogs about Y-Combinator in Groovy:

My goal is to explain/derive Y-combinator, not to just show how to use it. 
An excellent post with code examples in Scheme can be found here:
 http://mvanier.livejournal.com/2897.html

Combinators and Things: The History

"Computer Scientists do not know their history"  Eric Meijer
My background (education) is mathematics, not CS.  Within math I was never much into logic.  I am learning this stuff now.  I believe CS (maybe this is changing) education does not typically include topics like Combinatory Logic,  Lambda Calculus or Category/Functor Theory.
Universities produce an army of excellent engineers which think in only imperative terms.
(Correct me if I am wrong, I am sure this is very much school dependent.)

SKI Calculus and combinators (1920-ties, Moses Schönfinkel, Haskell Curry) are older than Turing Machines (1930-ties).  Lambda Calculus (1930-ties Alonzo Church) is concurrent to Turing work. Lots of this is very cool stuff and includes work of many other mathematicians, most notably, David Hilbert.

SKI calculus motivation was logic, not computing, but, in fact, Church's work was largely motivated by the idea of a computer (same as Turing work was).  You can think that lambda calculus was `software oriented` thinking, why Turning machines have been `hardware oriented`.  
That last statement can be questioned, after all, just two base combinators S, K (I can be derived) can express complete mathematical logic. Same way S, K and I could be a CPU instruction set. That idea has been worked on a lot (however hard it is to search for it on google:  "SKI hardware", "SKI instructions set"  ;).
The is some learning involved.  Here are some 'quick' guides I have recently found:

Enough of this into. (I am not sure it even belongs in this post).

Recursion as Fix-point

Functions are often defined using recursion. In this post, I will use Fibonacci sequence with its (super-slow) recursive definition which may look like this in Groovy:

 Closure fib
 fib = {int n -> n < 2 ? n: fib(n-1) + fib(n-2)}
but everything discussed here will generalize to other recursive function definitions.

Here is a different (and interesting) way to look at this.  We can create a functional FIB (a function that takes functions are argument and returns a new function) like so:

Closure FIB = {Closure f ->
  {int n -> n<2 ? n: f(n-1) + f(n-2)} //returns new closure
}
With Fpiglet you could alternatively define FIB like this (implicit currying):

 Closure FIB = FpigBase.f {Closure f, int n -> 
    n<2 ? n: f(n-1) + f(n-2)
 }

Our FIB functional maps functions into functions. The fix-point (if exists and here it does) is a function f0 such that:

 FIB(f0) == f0

It should be obvious that this f0 fix-point must be the same as the fib function defined above and this type of argument can be easily generalized to other recursive definitions.

So the name of the game is to find a way to 'solve' functionals like the FIB functional defined above for fix-points.

Fix function:

Would it be cool if we had a 'magic' function called fix which could calculate f0 like so:

 f0 = fix FIB

Obviously, Haskell has one ;) and it is defined like this:

fix :: (a -> a) -> a
fix f = f (fix f)

This is exactly what we need, if you blindly plugin 'FIB' for 'f' and 'f0' for 'fix FIB' the function implementation line becomes:

 f0 = FIB f0

so if fix returns anything it will he a fix-point!

With the above we can define Fibonacci sequence like this (note fib' is the same as FIB functional, we just do not want to use upper case function names in Haskell):

fib' :: (Integer -> Integer) -> Integer -> Integer
fib' f 0 = 0
fib' f 1 = 1
fib' f n = f (n-1) + f (n-2)

f0 :: Integer -> Integer
f0 = fix fib'

numbs = map f0 [0..6] --numbs equals to [0,1,1,2,3,5,8] 

Let us do that in Groovy!  There is a little glitch. Haskell is lazy language so the above implementation of fix does not have to stack overflow,  this simple Groovy equivalent will always stack overflow:

def fix
fix = {Closure f ->
    {x -> (f << fix(f)) x} 
}

but not if you do it like this:

def fix
fix = {Closure f ->
    f({x -> (fix(f))(x)}) 
}

you can run this to convince yourself that it works:

def f0 = fix(FIB)
assert [0,1,1,2,3,5,8] == (0..6).collect(f0)
assert 8 == f0(6)


The trick is that fix function passes closure {x -> (fix(f))(x)} to the next iteration of FIB and that closure does not need to be invoked if n < 2. 
This is cool but the definition of `fix` itself is recursive.  So we are swapping one recursion for another.  I would be better if we could do recursion without ANY recursion! 

Y-Combinator

Y combinator is, well is a combinator (whatever that means) invented, I believe, by Haskell Curry.
It it is typically represented by the following lambda expression (whatever that means):

  λf.(λx.f (x x)) (λx.f (x x))

Instead of explaining its meaning, let me try to derive it (using Groovy) and the meaning may become clear. Forget about fix function. I would really like to do something like this:

FIB(FIB)(6) //or FIB(FIB, 6)

This will not work with current definition of FIB but will work if you define it like so:

Closure FIB = {Closure f, int n ->
    n < 2 ? n: f(f, n-1) + f(f, n-2)
}

FIB(FIB, 6) //returns 8

or using curried versions:

Closure FIB = {Closure f ->
  {int n -> n < 2 ? n: f(f)(n-1) + f(f)(n-2) }
}

assert 8 == FIB(FIB)(6)

This is it, this is the main trick behind Y-Combinator.  The rest is simply bunch of refactoring steps.

This trick may look like cheating and to some extent it is.  Instead of being able to call function from itself (which would be recursion), we pass a copy of that function to itself so it can call it.  On one end this seems like recursion in disguise,  on the other end it is function composition.  On some deep level, composing function with itself and recursion are similar concepts.

I like to think that function composition is a stronger concept than recursion.   Y verifies that claim.

I can refactor my code so it looks like this:

Closure c2 = {f ->
   return {x -> f(f)(x)}
}

Closure FIB= {Closure f2 ->
  return {int n -> n < 2 ? n: f2(n-1) + f2(n-2)}
}

c2({Closure f ->
  return FIB(c2(f))
}) (6)  //returns 8

and that becomes this:

Closure c2 = {f ->
   {x -> f(f)(x)}
}

Closure FIB = {Closure f ->
  {int n -> n < 2 ? n: f(n-1) + f(n-2) }
}

def Y = {Closure fx -> c2({Closure f ->
   fx(c2(f))
})}  //fx represents functional, f function

def f0 = Y(FIB)
assert [0,1,1,2,3,5,8] == (0..6).collect(f0)

and this is it, we have Y Combinator in Groovy.  And if you look long enough you will see the original lamba expression!
Actually what we got is this (Groovy is strict/eager evaluation language):

 λf.(λx.f (λv.((x x) v))) (λx.f (λv.((x x) v)))

Formal Mathematical Proof

It is actually not very hard.   This PDF: http://www.inf.fu-berlin.de/lehre/WS03/alpi/lambda.pdf
has a good introduction to lambda caculus, but the proof that Y acts as fix-point finder requires more explanation:

Y g = (λf . (λx . f (x x)) (λx . f (x x))) g   //1. definition
= (λx . g (x x)) (λx . g (x x))                //2. β-reduction of λf: applied to g
= g ((λx . g (x x)) (λx . g (x x)))            //3. β-reduction of λx done on x x
= g (Y g)                                      //4. definition again

The only thing that needs explanation is how do we get from 2 to 3.  We basically are substituting 
 (λx . g (x x)) into x x expression on the left!

Useless or useful?

If you visited Hamlet D'Arcy's post on Y-combinator you may have noted that he has classified Y-combinator as cool but useless.
I think, the main applicability of Y-combinator is that it lets you keep your closures as anonymous functions.   You can define recursive closure without giving it a name like this:

Y {Closure f ->
    { ... }
}

I think that could make some of your code more expressive.  Also if pointless programming is important to you, then you should love Y!  Now, pointless and useless are not exactly the same ;).

There is also the thing about recursion being logically dangerous.  You can lookup 'Curry Paradox' and read on that.  But hopefully compilers will catch all of that for us, and I do not think this is related to Groovy programming anyway.

So is this just cool theoretical CS?  My take on this is that, for working programmers, it maybe more important to know what combinators are in general and try to learn to 'think in combinators'. In that Y-combinator probably will not help much.  'Thinking in combinators' will involve higher level combinators like function sets defining Monad, Arrows, Lenses, etc.

Saturday, August 10, 2013

Monadic Comprehensions in Groovy and Fpiglet

This post is continuing my last 2 posts about monads.   In this post we will look at Groovy DSL syntax (as implemented in Fpiglet) for monadic comprehensions.

Legend:
  f([1,2,3]) - converts list [1,2,3] to functional list with the same elements
  f(myClosure) - converts closure to curried function
  << - is functions composition and passing param to function in Groovy,
         it also has been DSL-ed to do other things in comprehensions.

What are comprehensions:
If monads are about putting things in a 'box', then monadic comprehensions are magically removing the box. Only there is no magic.  Comprehension is a polymorphic (works across all monads) syntax which lets you operate on monads using 'unboxed' values.
Here is a simple example using Fpiglet Maybe type:
  
  def res = comprehend {
    x << from{ just(3) }
    y << from{ just(2) }
    output{ just(x+y) }
  }

  assert just(5) == res

One thing that Maybe and Either are famous for is that they are a great alternative to handling exceptions and nulls.  Let us see a bit of that in action:

 Closure maybeDivide = f { Maybe num, Maybe den ->
     comprehend {
       x << from { num }
       y << from { den }
       restrict { y != 0}
       output{ just(x/y) }
     }
 }

 def sixDividedBy = maybeDivide(just(6))
 assert just(2) == sixDividedBy(just(3))
 assert nothing() == sixDividedBy(just(0))

I think this is very cool and much more elegant than using if statements or throwing exceptions!
Fpiglet supports another syntax (somewhat LINQ inspired) which is also quite readable:

 Closure maybeDivide = f { Maybe num, Maybe den ->
   select{just(x/y)}.from {
      x << { num }
      y << { den }
      where { y != 0}
   }    
 }

Both versions are equivalent.  I will use select syntax moving forward.   

I hope you have enjoyed my Learn You a Haskell book inspired knight examples in previous posts. 
Here is a definition of knightMove function expressed using comprehension (please recall that positions are modeled as 2D Groovy lists [x,y] where x,y are in 0..7 range, but the list of all outcomes is a functional list):

 def knightMove = {pos ->
   def (x,y) = pos 
   select{ f([newPos]) }.from{ 
     newPos << {f([[x+2,y+1],[x+2,y-1],
                   [x-2,y+1],[x-2,y-1], 
                   [x+1,y+2],[x+1,y-2], 
                   [x-1,y+2],[x-1,y-2]]) }
     where {def (newX, newY) = newPos; newX in 0..7 && newY in 0..7}  
   }
 }

 assert [[2, 1], [1, 2]] == funlistOut << knightMove([0,0])

I am very much interested in code readability, so let me read the above code:

The result of knight move is select list (select {f[..]}) of new positions (newPos) from (.froma list of spelled out positions ({f[...]}), where (where) the new position coordinates are in 0..7 range (newX in 0..7).

One of the reasons for having comprehensions is that they read very nice!

And here are all possible positions after 3 moves:

 def in3Moves = select { f([third]) }.from{
   first  << { knightMove([0,0]) }
   second << { knightMove(first) }
   third  << { knightMove(second) }
 }

I think it reads even better!

Summary:
In previous posts we did go beyond simple 3 time knightMove to solving problems like the shortest number of moves needed to move from A to B.  Stuff like this still requires folds.    
However comprehensions are a very powerful and very readable syntax.  

Please check out Fpiglet website in a few weeks for more information about Fpiglet monadic comprehensions.