Friday, July 5, 2013

How to fold Infinite List in Groovy - Functional Groovy with Fpiglet

Folding on infinite list. (To be exact, right-folding infinite list.)  That does sound impossible and is a frequently discussed puzzle that Haskell programmers enjoy.  How can you start at infinity and iterate backwards to the result?  Can it be done in Groovy as well?

We will use Fpiglet. The solution I am going to explain here is not as elegant as Haskell's but I think should be still quite interesting.  First some intro for all readers who are not familiar with functional terminology.

Introduction:  Fold and reduce functions are functional programming equivalent of 'iterator'.  Functional programs do not want to mutate anything so the fold functions provide a way to 'accumulate' new data from the list without mutating any state.  There are 2 ways to iterate over a list:  
  • scan it from the beginning to the end (left fold/left reduce)
  • scan if from the end to the beginning (right fold/right reduce)
Folding accepts initial value of accumulator, reducing takes first element of the list as accumulator. 
 foldL   (foldClosure, initAcc, list) -> finalAcc
 reduceL (foldClosure, list) -> finalAcc

where
  foldClosure(acc, element) -> newAcc

So the folding functions scan the list and for each element execute foldClosure passing it current accumulator and current element. The job of foldClosure is to create new version of accumulator so nothing is mutated.  I am sure you recognize similarity to Groovy Collection.inject(Object, Closure) and Collection.inject(Closure) because these are OO equivalents of foldL and reduceL.

Here is an example from Fpiglet, we use foldL to calculate smallest element in the list:
  assert 2 == reduceL({acc,b -> (b<acc)?b:acc}) << [4,10,2,7]

this does not look much better than Groovy code would so here is more of Fpiglet style:
 assert 2 == reduceL(MIN)<<[4,10,2,7]

This will scan the list (4,) 10, 2, 7.  If we wanted to scan the list (7,) 2,10,4 we would write this:
 assert 2 == reduceR(MIN)<<[4,10,2,7]

One difference is that with right-folds the folding function has parameters flipped:
  foldClosure(element, acc) -> newAcc

Infinity (and beyond?):   Fpiglet implements functional programming version of list which is lazy. That means that what is on the tail of that list is not evaluated until needed. This allows for creation of recursive lists (also called streams) which never end like all natural numbers:
  def naturalNums = naturalNumbers() 

Fipglet has special version of right-folds which work with infinite lists: foldXR and reduceXR.  

So lets go starting at infinity all the way back to 1 ;),  FRIST is one of predefined funcitons returning first argument :
  assert 1 == reduceXR(FRIST) << naturalNums

Lets find first natural number larger than 50 which can be divided by 7: 
 def foldClosure= f {b, acc -> (b>50 && b%7==0)? b: acc()}
 assert 56 == reduceXR(foldClosure) << naturalNums

It works!  But wait, I was not writing about curried functions for nothing in last 3 posts, this is better code example:
 def finder = f {predicate, b, acc -> 
         predicate(b) ? b: acc()
 }
 def res= reduceXR(finder{it%7==0 && it>50}) << naturalNums
 assert 56 == res

We have folded natural numbers all the way from infinity back into 56!

How it is done:
Functional lists are immutable and are created by prepending elements one in front of the other.
So a list like [4,10,2,7] can be written in Fpiglet like so:
  e(4) << e(10) << e(2) << e(7) << empty()

It is even more intuitive to see it using Haskell notation:
   4 : 10 : 2 : 7 : []  

or, to be more explicit
    4 : (10 : (2 : (7 : [])))
now I want to right-fold this list using folding closure, let us take something simple, we tried MIN, MODULO, let us try PLUS (+).   
Right folding the above list with + is equivalent to this:
      4 + (10 + (2 + (7 + [])))

Did you notice that I just replaced ': 'with '+' ? 
So for generic folding closure fc,  if I could only write  
     x 'fc' y  //instead of fc(x,y) 

we would get this:
    4 'fc' (10 'fc' (2 'fc' (7 'fc' [])))

Now imagine, what happens if for some values of x in fc(x,y), the result does not depend on y.  Like in our situation where predicate(b)is satisfied!  Then we get this:
    4 'fc' (10 'fc' (2 'fc' (...who cares...)))

The only issue is how do I know that fc does not care about second parameter?  reduceXR and foldXR pass second parameter (acc) as a closure!  To use accumulator, the closure needs to resolve it first.  It returns the new accumulated value of course, not a closure.  The logical signature for folding closure would look like this

  fc :: el -> (_->acc) -> acc //el -list element, acc -accumulator

Case solved!

It is not so good:
FoldXR and reduceXR implementation are prone to stack overflows.  This is true even for Haskell!   And it follows from the fact that we really are creating a long chain of composed functions!
There are other functional methods which work well on infinite lists, like map or filter

Here is a simple chain of Fpiglet code, which is stack overflow safe:
 def res= filter{it%7==0} << filter(LARGER(50)) << naturalNums
 assert [56] == funlistOutTake(1) << res

I hope you enjoyed this post and I invite you to check out Figlet.






No comments:

Post a Comment