Software has two ingredients: opinions and logic (=programming). The second ingredient is rare and is typically replaced by the first.
I blog about code correctness, maintainability, testability, and functional programming.
This blog does not represent views or opinions of my employer.

Monday, April 23, 2012

Imperative curlies 6: removing ifs


In last few posts I tried to argue that there is little or no code reuse around for loops.  There is one notable exception, however, the code reuse by adding lots of additional curlies called if statements.   This post contains my thoughts on how to change imperative ifs used in this way.

Often the developer is faced with two obvious options, repeat very similar (but not identical) imperative logic in several places or code the logic in one place and include lots of if statements within this ‘generalized’ logic to handle the differences.  The generalized logic is often a private implementation method with several Boolean parameters.  It is invoked from various public methods which will set the booleans to trigger the needed side-effects within the private method.  Obviously many programming techniques have been developed to avoid such code (Template Method Design Pattern, or even the concept of polymorphism itself),  but still the if statements are often easier to use.

Here is a piece of code taken directly from the open source Ext JS 4.0.7 (fragment of Ext.form.Basic):
01: getValues: function(asString, dirtyOnly, includeEmptyText, useDataValues) {
02:  var values = {};
03: 
04:   this.getFields().each(function(field) {
05:      if (!dirtyOnly || field.isDirty()) {
06:         var data = field[useDataValues?'getModelData':'getSubmitData'](includeEmptyText);
07:         if (Ext.isObject(data)) {
08:             Ext.iterate(data, function(name, val) {
09:                 if (includeEmptyText && val === '') {
10:                     val = field.emptyText || '';
11:                 }
12:                 if (name in values) {
13:                     var bucket = values[name],
14:                         isArray = Ext.isArray;
15:                     if (!isArray(bucket)) {
16:                         bucket = values[name] = [bucket];
18:                     }
19:                     if (isArray(val)) {
20:                         values[name] = bucket.concat(val);
21:                     } else {
22:                         bucket.push(val);
23:                     }
24:                 } else {
25:                     values[name] = val;
26:                 }
27:             });
28:         }
29:     }
30:   });
31:  
32:   if (asString) {
33:      values = Ext.Object.toQueryString(values); 
34:   }
35:   return values;
36: }      
Ext.form.Basic is an Ext class defining base/reusable behavior of Ext Forms.  Ext JS uses the above method internally:
01: getFieldValues: function(dirtyOnly) {
02:    return this.getValues(false, dirtyOnly, false, true);
03: },      
Also,  the last argument (useDataValues)  is not documented, making getValues behave as an old style form submit value retrieval.   Instead of repeating the same logic and providing separate implementations to retrieve only dirty data from the form vs. all the data,  retrieve data using new model semantics vs. using old form submit,  etc, etc,  the Ext library codes the logic once and provides if conditionals within the logic to handle different cases.

Looking at the above snapshot, can you easily see what the code is doing?

The idea or re-implementing the code by the removing the curlies/shortening distance between curlies is somewhat unfair to the reimplementer.  After all the method signature has all these Booleans (or JavaScript truthies),  and I think the API designers had an imperative implementation in mind when defining the signature of this method.  Declarative/functional thinking vs imperative thinking will impact the APIs.

Lets do it anyway!   I am dropping the asString parameter and I will type it as a separate function.  I personally dislike function result set type changing based on the parameters.   Standard function composition can be defined this way:
f: X -> Y
g: Y -> Z
g Compose f: X -> Z

(g Compose f(x)) = g(f(x))
The more I code with JavaScript the more I find that it is more convenient to do this type of composition (compose with function which is aware of the original argument set):
f: X -> Y
g: Y, X ->Z
g ComposePlus f: X -> Z

(g ComposePlus f) (x) = g(f(x), x)
Assume that we have bunch of utilities coded for things like flexible array concatenation, and that the ‘getFields()’ method returns a rich JavaScript collection object loaded with nice methods like map, fold, filter.  My imaginary filterIf(condition, fn) method will return original collection if condition is not met and filter otherwise.  So here is the new version of the code:
01: getValues: function(dirtyOnly, includeEmptyText, useDataValues) {
02:   var dirtyOnlyAdjustF = function(field) { return field.isDirty(); };
03:    
04:   var fieldRetrievalF = useDataValues ? 
05:                           function(field) {return field.getModelData(includeEmptyText);} :
06:                           function(field) {return field.getSubmitData(includeEmptyText);};
07:
08:   
09:   var emptyFieldAdjustF = function(data, field) {
10:      Ext.iterate(data, function(name, val) { 
11:         data[name] = (val === '') ? field.emptyText : '';
12:  }                                                                          
13:      return data;
14:   };
15:
16:   if(includeEmptyText) {
17:      fieldRetrievalF = FunctionUtil.composePlus(emptyFieldAdjustF, fieldRetrievalF);
18:   }
19:
20:   var foldingF = function(data, aggregator) {
21:      Ext.iterate(data, function(name, val) {
22:        var bucket = aggregator[name];
23:        aggregator[name] = bucket ? ArrayUtil.concatenate(bucket, val) : val;  
24:     }
25:   };
26:     
27:   var values = this.getFields()
28:                      .filterIf(dirtyOnly, dirtyOnlyAdjustF)
29:                        .map(fieldRetrievalF)
30:                          .fold({}, foldingF);   
31: },
32:
33: getValuesAsString: FunctionUtil.compose(Ext.Object.toQueryString, this.getValues);
Lots of developers will say that the second code is better because it is fluent.  I think the fluency is more of a side-effect resulting from changing from imperative to declarative programming style.

Looking at the above code,  how easy is to figure out what the code is doing?   Actually the code says what it does, if statements never do that.

Many developers do not like the ternary operator.  I have used it here to emphasize the declarative aspect.   If the distance between the curlies is short, then we are close to being declarative.  I like when the language lets me drop the curlies to emphasize that the code has been sufficiently refactored from the imperative.   In this example JavaScript ternary operator does just that!

I hope the code speaks for itself so I will not say anything more about it.

Friday, April 13, 2012

Imperative curlies 5: good conditionals

Continuation of my previous bashing of curlies.

In the previous posts I have proposed 2 rules: eliminated the curiels for better code and shorten the distance between curlies for better code. This post adds the following rule (3): go an extra mile in eliminating the curlies and cool stuff will happen.

If and switch statements are examples of curlies. I still have hard time wrapping my mind around conditional statements in programming. When are they bad and when they are OK?

Side Note:  Conditional statements in imperative and functional worlds are very different beasts.  For example, Java if-else statement is all about conditional side-effects.  In pure functional programming, if-else statement cannot have side-effects and it is really a function returning values. Clearly, functional ifs are more likable.

If statements have been considered villains for a very, very long time. For example, the Visitor Design Pattern has been advocated as an OO alternative to if/switch statements. The worst offenders are the pieces of code with nested if statements which split imperative logic into logical branches and smaller branches and smaller branches… Code like this is where bugs like to thrive, good test coverage is hard or impossible to achieve, the interaction between different conditions is hard to analyze. Code like this typically implies bad design. We used to call such design procedural and lacking Object Orientation. I believe it is this type of code that gave if/switch statements bad name. But are conditional statements evil PERIOD, or is it how we use them that makes them the bad guys?

Many conditional statements are declarative in nature. The concept of pattern matching is fundamental to functional programming. Similarly to for loops they are sometimes simply unavoidable. A good example is specifying boundary conditions: think of differential equations in math, or a typical recursive function definition. Here is an example, Fibonacci sequence is de-facto the hello-world of recursive functions:

fib(0) = 0
fib(1) = 1
fib(n) = fib(n-1) + fib(n-2) for n=2,3,…

If you wanted to define this sequence in Java you would write something like this (Java):
1: int fib(int n){
2:      switch(n) {
3:          case 0: 
4:          case 1: return n;
5:          default: return fib(n-1) + fib(n-2);
6:      }
7: }     
Who cares about Fibonacci numbers? Will I use it on my next programming project? I doubt that I will. The point is, however, that problems in real life have boundary conditions, is handling nulls a boundary condition? Fibonacci numbers are just a good example of boundary condition and recursion all in one.

Instead of looking at memorization or numeric optimization aspects, I will approach the fib sequence from the point of view of curlies: I will try to remove them.

So how can this be done? Is there a way to define fibs without the boundary conditions? One way is to try to replace the recursive definition with a formula (a closed form expression) (if there is one). In the case of fibs there is the Binet formula and more. So if you go this route you will rub your elbows with things like golden ratio end up studying Da Vinci’s work, ancient philosophy and look at flowers. All very cool stuff! Replacing declarative definition of fib with Binet formula (maybe not super computer friendly, but still) gets rid of the curlies and introduces some cool albeit old science!

I came across another way of approaching fib sequence (or similar recursive) problems. Imagine the impossible: computer program understanding the concept of an infinite sequence. Let’s make it even better, in this utopia world you can define infinite sequences recursively! You may want to define fib sequence by saying that it is a sequence which starts with 0 and it is followed by 1 and it is followed by sequence composed by adding corresponding elements of shifted fib sequence (recursive itself) to the left by 2 positions to shifted fib sequence (recursive itself) to the left by one position. So before the impossible dream evaporates in a puff of smoke let me write the SCALA code (example found in Wampler/Payne Programming Scala book):
1: val fib: Stream[Int] = 
2:     Stream.cons(0, 
3:        Stream.cons(1, 
4:           fib.zip(fib.tail).map(pair => pair._1 + pair._2)))
5:
6: fib.take(7).print  //prints: 0, 1, 1, 2, 3, 5, 8, empty
Notice that the conditional statements and curlies have disappeared and we have introduced some deep programming concepts at the same time. It seems like cool stuff happens when curlies disappear!

Short explanation of what this code does: lazy evaluated Streams are infinite sequences in SCALA. The infinity does not come into play because the sequence is not evaluated until needed (take() call). Sequences are like linked lists so they have head and tail. Stream.cons(a, b) returns a stream starting at a and b being a tail. Zip is a functional concept of combining 2 sequences into one sequence of pairs (2 first elements, 2 second elements, etc). SCALA can be even less verbose (from SCALADoc):
1: val fib: Stream[Int] = 0 #:: 1 #:: 
2:        fib.zip(fib.tail).map(pair => pair._1 + pair._2)
There is also a quick ‘Am I an imperative programmer’ self-test here: 
We have defined fib(n: Int) without using ‘n’. Test question: Do you have a problem with this?

The point of all of this nonsense if that conditional statements are part of life and have to be a part of my programs. This does not mean I will not try to avoid them. Only, there is no universal solution, no magic pill. Binet’s formula works for Fibonacci numbers but will not help me in writing null pointer exception free code without if statements. For this I may end up using other tricks like SCALAs Options concept, JavaScript && or Groovy ? or maybe even decide to learn what a monad is. Going extra mile in removing the curlies will force a deeper understating of how to write beautiful programs into me.