Software has two ingredients: opinions and logic (=programming). The second ingredient is rare and is typically replaced by the first.
I blog about code correctness, maintainability, testability, and functional programming.
This blog does not represent views or opinions of my employer.

Thursday, October 18, 2012

Curlies 10: recursion, why language matters

This is 10th installment about Curly braces: my arguing that curlies are the reason for all the evil in programming ;)  And my team has introduced mustache to my current project :D.

I wrote about for loops and how they compare to functional alternatives before. Today's topic is a bit different but is still relevant to the imperative for loops.  The point is that recursion can be a very expressive alternative to the for loop.

Lets look at some code examples.  Groovy collections have a method called collect which applies a given closure to each collection element like so:

assert [10,20,30] == [1,2,3].collect { it * 10 }

If Groovy did not have collect method and we wanted to write our own, how would we do that?
Probably with a for loop!  In fact this is how the underlying implementation of Groovy method looks like:

   public static Collection collect(Object self, Collection collection, 
                            Closure closure) {
       for (Iterator iter = InvokerHelper.asIterator(self); iter.hasNext();) {
            collection.add(closure.call(iter.next()));
       }
       return collection;
   }

Let's compare this with functional language equivalent (let's do that in Haskell!)  So if Haskel did not have a built in Map function, here is how we would write one:

    map :: (a->b) -> [a] -> [b]
    map _ [] = []
    map f (head : tail) = f head : map f tail

We are done!  I do not want to say we 'implemented' or 'coded' map function.  This would imply that we told the computer what to do and we did not!  We told the computer what map is.
Haskell to English:  (1st line) is declaration,  in Haskell all functions are curried so to define function c = f(a, b) you write f :: a -> b -> c.   So, this declaration says that map accepts both a function: a->b and a list [a]as parameters and returns a list [b].  
(2nd line) Says that map of empty list is an empty list no matter what the function is.  
(3rd line)  Says that to map a non-empty list you apply the function on the first element of that list and apply the map with this function to the remaining elements of the list (recursive induction).  
Side Note: a very cool pattern matching is happening.  If list [1,2,3] is defined as x:xs, that means that:  x==1 and xs==[2,3].  The parentheses (head : tail) are here only to disambiguate the order of things.  In Haskell to call function f on x you just type: f x
I find Haskell version much much readable!

Next lets look at something a bit more complex. Haskell alternative to Groovy's collection findAll method which works like so:

 assert [3] == [1,2,3].findAll { it > 2 }

Can you imagine how it is implemented?  Obviously with a for loop! I will spare you the details.  Here is how you could do that in Haskell (Haskell calls the same operation  filter):

 filter :: (a->Bool) -> [a] -> [b]
 filter _ [] = []
 filter p (head : tail)
     | p head      = tail : filter p tail
     | otherwise   = filter p tail

Haskell to English:  This bar ( | ) things are called guards and are somewhat reminiscent of our old switch statement.
(1st line) is declaration.
(2nd line) filter applied to empty list results in empty list no matter what the predicate function is.
(3rd line and on) for predicate function p and list with first element called head and the remaining elements called tail, if the predicate is satisfied for the head element then the head element is included otherwise it is not.
That is all there is to it!

You can see the expressiveness of Haskell!  Recursion allows us to almost translate the definition from English directly to the program!  Readability!   Imperative programs tell the computer what to do,  declarative programs tell the computer what things are.

So why languages matter?  Many programmers (I was one of them) tend to think that the multiplicity of new languages today comes from some form of developer vanity.  Hey why not create one more language?   Why do we assume that all advancements in computer science can be expressed in, say, Java?   Obviously they can't!  New work in type systems, in compilers, ... the point is clear but we still do look down on all these new languages!

The recursion is another little reason why new languages are needed.  You could try to use recursion in Java but with big collections prepare yourself for stack overflow errors. You need some serious language help for that!
Now I am NOT saying that there are no stack overflow problems in Haskell or SCALA.  In fact, knowledge of how to avoid stack overflow errors is part of language learning.  However, functional language compilers are armed with bunch of tricks such as tail-call optimization, trampoline...  These tools allow language compiler to resolve recursion into a for loop!
Added Note:  As I have learned recently Groovy is catching up and you can trampoline your closures (that is not a compiler function and requires explicit coding).   Also there is some tail-call optimization support for methods (not closures) in both Groovy and Java.  I may write more about it in the future.

Final little point  (and I tried to make this point before):  I hope I have shown some very expressive code examples.  However the terseness and expressiveness was not the goal, rather is was a consequence. Declarative programming yields terse and expressive code, but not all terse and expressive code is declarative or functional.  So what is more important expressive or declarative, expressive or functional?  But this maybe a good topic for another post some day, maybe.




Wednesday, October 17, 2012

Semantic Web Presentation

This post is dedicated to all who took part in my recent Semantic Web presentation.

There was no time to cover everything I wanted.  I will use this post page to touch on some things I should have included but did not. 

Here is the link to my presentation:

I have mentioned that there are no good books to read, or at least none that I could find. I ended up reading Developer's guide to semantic web.  I cannot say I recommend it,  it is better than the other one I have tried: Semantic Web Programming.  I have also briefly looked at o'reilly book but it seemed it provides less comprehensive coverage of the subject.

Large topic I decided I will have no time to talk about during my presentation is: microformats.   I appears that microformats are liked by the REST community and semantic/linked data community is not crazy about them.  To me microformat is 'hardcoding' semantic data,  RDF is not.  I do see the flexibility of RDF as a big advantage.   If micoformats are new to you, the idea is to simply add class attributes to your HTML and put the semantic information by using certain agreed upon by community class names which represent semantic info.  The catch is that these names have to be what community has agreed to. That agreeing takes time and often is not easy (maybe even not possible).
Quick intro to this idea can be found here: http://microformats.org/get-started

Interestingly (and relevant) is the species microformat: http://microformats.org/wiki/species
It has been proposed in 2006 and is still in a straw-man status.  There are several good reasons why it is hard to agree on microformat for species.  Consider this simple question:  how would you identify a species, would you use, say, scientific name or common name?  Both are likely to change and do change... If only microformats had owl:sameAs... But if they did they would not be microformats, would they?
Now add more complex questions of how to handle several different taxonomies, etc.
Species microformat is a good example why the RDF direction maybe a better choice for creation of semantic information about biology and wildlife.

During the presentation, I have decided to focus on the aspect of publishing semantic data.  Another big and important topic is how to take advantage of semantic data published internally or by other entities.  How do you consume semantic info?  That would open a conversation about programming libraries like Jena, Pellet, Sesame..., tools like Protégé, Neon....  
The classic idea in this direction would be:  find all publications which talk about Bald Eagle needing Water as resource.  Now the publication may be tagged under Bald Eagle and River, or Bald Eagle and Lake, or Raptors and Fish.  Not your everyday DB query is it?

And finally there is the 'semantic first' and 'semantic last' discussion.  It may seem like a topic for later, but is not.  The point is that we would be designing software differently if part of the thought process involved semantic considerations.  It is hard to design for something you do not understand.
One aspect of such consideration is: is there a relevant ontology out there (example: Dublin Core for published works).  If so, should I design my software in a similar way (even if I do not intend to publish RDFs near term)?    The broader question in this area is:  how do we design the software so that adding semantic presence will not be hard in the future...

Lastly,  lots of people think of semantic web/linked data as one big database.  Let me reiterate that: whole web is one big database!   Will this be a game changer, will we see 'Macro' computing (if that even is a valid term)?  Looking at things from a very large scale point of view could be big (so what is relationship between number of cancer cases and geographical latitude? no problem, let me run a query...).
Such uniformity (one database!) will not be achieved with SOAP web services.  How about with microformats?  Except for few markups community agrees on, unlikely!   REST is an idea which may provide needed uniformity (but I doubt it will - see my previous posts on why).   With RDF and ontologies: we may get uniformity without uniformity (we do not need to agree on using the same terms as long as the relevant terms are connected). As long as owl.sameAs, rdfs.seeAlso, rdfs.isDefinedBy, foaf.primaryTopicOf, etc are used there still may be uniformity within all the diversity.