Functional Programming, Continued

Original Functional Programming, Continued Editable
version 1 of 1



A long time ago as a college sophomore, I read Slava Akhmechet's Functional Programming for the Rest of Us. This was probably the single biggest influence in my programming education. It made a massive and immediate difference on my style and productivity. (The second most influential was a piece about Lisp, primarily extolling the REPL. I've long since lost that bookmark.)

Slava gives a good overview of the history and fundamental of FP. To summarize his article, here are the practical points:

  • use pure functions
  • avoid side effects

There is of course a lot more presented in that 8000-odd word essay about why you do this and what you can then achieve. But as a student, those items were what made a tremendous difference in my own OOP and imperative programming. For a brief time, every program felt trivial and easily within reach. Some were. Many of course were not.

He states that many of the traditional design patterns are completely meaningless in the FP domain. While this is the case, FP has its own set of common traps to catch the unwary, and these traps have patterns to work around them. What follows are simple guidelines for writing simpler pure functions.

Keep the function hierarchy flat

After spending an hour in the REPL you've hacked out a complex process. Going over your notes and session history, the hour's work is condensed to a huge hundred line function. You notice a few clean points to break it into several small steps. These steps were previously called in order as a single operation, so you write something like

# sample 1
def find_conf(...):
    ...
    return read_conf(...)

def read_conf(...):
    ...
    return parse_conf(...)

def parse_conf(...):
    ...
    return verify_conf(...)

Superficially this may look fine. Every function is pure and there are no side effect. It even reads quite nicely and you might be slightly tickled by writing code that appears tail-call friendly. But it is impossible to debug. If there is a bug in read_conf(), the error is passed on through parse and verify before you get to see it. Maybe the bug is really in parse or verify. Who can tell? There are no intermediate values to assist.

So avoid deeply nested hierarchies. Let each function stand on its own without calling the next step. Write some convenience wrapper glue so the new flattened hierarchy does not require manually calling each step. But make sure there is no overlap between the functions that do and the functions that glue.

# sample 2
def load_conf_glue():
    return verify_conf(parse_conf(read_conf(find_conf(...))))

Notice that load_conf_glue() does not actually do anything. It just makes sure that the functions doing the heavy lifting are called in the correct order. If the multitude of small functions feels too cluttered, hide the lower levels inside a namespace. Or module or object or class. Whatever your language provides.

Keep argument lists short

Nothing is quite as depressing as writing a pure function that requires five or even ten arguments. A single function call might take up 80 columns on its own. It is hard to test, hard to use and hard to document.

Usually argument lists start growing when you need to add custom flags to functions deep in the hierarchy. Of course we already know that deep hierarchies are bad, but they are somewhat unavoidable as a program grows.

The following examples will use elephant as the troublesome argument. In your programs it will take many forms. Maybe it is the amount of precision used for calculations. Or the character encoding for processing strings. Or the bit depth of internal images. In my experience, the troublesome arguments are meta-flags that configure how a function goes about calculation. The small functions deep in the call tree are the only places the flag might be used, but the flags are user selected and thus come from the highest UI levels. Somehow these flags need to propagate through all the function calls, from where they are created to where they are used.

# sample 3
elephant = ...  # global not in the arg list

def a():
    return b()

def b():
    return c()

def c():
    return d()

def d(): 
    if elephant:  # impure
        return ...
    return ...

Putting elephant into the arg list makes everything a good and pure little function.

# sample 4
def a(elephant):
    return b(elephant)

def b(elephant):
    return c(elephant)

def c(elephant):
    return d(elephant)

def d(elephant):
    if elephant:
        return ...
    return ...

This really bites in complex trees of functions. In a bad case, function a() might have a dozen such flags not directly used but instead passed onward. There are three ways out of this mess. First, you can flatten the hierarchy as previously described.

# sample 5
def abcd(elephant):
    return d(c(b(a())), elephant)

If there are a bunch of these flags, only abcd() remains an ugly mess.

Not everything can be flattened, a work around is needed in those cases. The natural inclination is to make a dictionary or object bucket to hold multiple flags. This makes the argument lists short, but now every function in the chain has to deal with a large configuration object. Verifying this object is correct/complete is annoying, as is having to mock up a bunch of settings just to test one function. The function calls will be shorter, but using buckets is just as unwieldy as the original long argument list. Don't hide arguments in buckets, it is ultimately a wash. The one exception would be GUI toolkits where every call could potentially take dozens of arguments. Internally these put a catchall **kwargs into every function declaration and then pass that bucket of assorted keywords onward to every single function call. These are a pain to debug. If you are lucky, the keyword is unique and greppable.

For completeness, here is what bucket-o-state abuse looks like in Python. Other languages have analogous structures. Try to avoid it.

# sample 6
def a(**kwargs):
    return b(**kwargs)

def b(**kwargs):
    return c(**kwargs)

def c(**kwargs):
    return d(**kwargs)

def d(elephant):
    if elephant:
        return ...
    return ...

a(elephant=True)

 

The third method uses a combination of lazy evaluation and partial application. When I first heard of partial application, I was pretty dismissive of it. Partial eval sounded like the exact same thing as currying. Both were lumped together as cute bits of mathematical handwaving, de-sugaring multiple arguments into something simpler for the compiler to deal with. Not a feature for humans to use.

This is not lazy eval.


 

Turns out the technique is quite useful for cleaning up messy arguments. Deeply nested flags can be cleanly handled with lazy partial evaluation. First, a simple example of how lazy evaluation works for this situation. This expands on Sample 4.

# sample 7
def a(elephant):
    return b()(elephant)

def b():
    return c()

def c():
    return d  # not evaluated yet

def d(elephant):
    if elephant:
        return ...
    return ...

Yes, that actually works. Returning functions up the stack means arguments don't have to be passed down the stack. If three more flags needed to be added to d(), the intermediate b() and c() functions remain unchanged.

Partial evaluation comes into play when there are a mixture of normal arguments and flags.

# sample 8
from functools import partial

def a(elephant):
    return b()(elephant)

def b():
    return c()

def c():
    d1 = ...
    return partial(d, d1=d1)

def d(d1, elephant):
    if elephant:
        return ...
    return ...

You can avoid passing arguments through the tree of functions this way. Partial evaluation lets you bring the function to the data. It is a very powerful technique and useful when you feel painted into a corner.

Returning functions up that many levels smells though. And it gets really complicated when applied to multiple functions in the call. You get messes like a()(arg1)(arg2)(arg3). Or was it a()(arg3)(arg2)(arg1)?

In these cases it is easier to apply the partial evaluation at the high level, setting the configuration values known at the time. Then pass the custom function down to where it will be needed. Once again, expanding on Sample 4.

# sample 9
def c(..., d_fn):
    return d_fn()

def abcd(**kwargs):
    d_custom = partial(d, elephant=kwargs['elephant'])
    return c(b(a()), d_custom)

Slava touches on this method in the Higher Order Functions paragraphs, but does not do them enough credit. Passing functions in is most useful when the d() call is in the middle of c() and not easy to refactor out. Practically speaking, it is almost as bad as a bucket-o-state object. The partial function still needs to be passed around and it is even harder to mock/verify.

Ultimately, the best solution involves blending several of these patterns. Blindly adhering to any one of them will result in much useless pain.