Search This Blog

Monday, October 8, 2012

Functional Programming – the lost booty

Lisp was conceived in 1958 and already implemented by the early 60s.  One of its strange features was something called 'garbage-collection' … which took 35 years to enter the mainstream in Java.

Which is to say that for 35 years:
  • CS researchers did whatever they were doing for their tenure, (sorry) publications
  • Programming teachers righteously beat their students on their knuckles for getting pointer-errors/core-dumps/segfaults etc…  all the bad stuff, and inculcated malloc-morality and pointer-honour: all the good stuff that gives one right-of-passage to being a real he-man1 programmer
  • IT industry wasted billions of dollars on 'stoopid programmers' who had presumably not been beaten enough by their teachers to become honorable.
And then Java comes along and shows that garbage-collection is a good idea, Microsoft renames the unclean 'garbage' to a nice new adjective – managed – and everyone is happy.

How many features like garbage collection have been lurking about in 'advanced' functional languages for decades and are waiting to be (re)discovered by the mainstream?

In searching for this lost booty of useful features, its important to make a distinction – the hype axis and the useful axis. Now hype is good; no business could happen without advertising which is after all only organized hype.
So referential transparency, higher-order functions, lambda calculus, rank-2 types, monads etc etc is all good stuff but one has to convert to a certain religion to attain the promised salvation.

Are there things available from functional programming for the unconverted?

Here is a list that programmers across the board (not just FPers) could benefit from2
Data Structures vs Storage Structures
In a non-gc language programmers think of storage structures. In a gc language they think of data structures. The first is an engineering outlook, the second a math one – a fundamental difference.
Value vs Reference Types
Among mainstream languages C# is foremost in providing a fundamental language-level choice between value type and reference type
The reference type is traditional OO, the value-type is inspired by the FP desideratum of immutability
Concrete syntax for lists/tuples
In a gc language without a syntax for lists (eg Java), the thinking is definitely freer than with say C but more encumbered than with languages with lists (and which is not just classical FPLs but almost all modern 'scripting' languages -- python, ruby...)
More generally concrete types
One step ahead of the last: but an enormous leap for conceptualization -- everything is a value…
Rich world of values
… Even code!  Functions and data sit at the same table.
Pattern matching
Patterns tell stories transparently where raw code does it opaquely – 'x:xs' is the non-empty list whose head is (called) x and tail, xs. Dates from as far back as 1960s (SNOBOL and COMIT)*
See Crossroad
Currying
Syntactically currying is just a neat trick – instead of writing f(x,y) we write (f x) y which can also be shortened to f x y. More importantly, the y may be dropped.
Semantically it is a deep concept and refers to the fact that the 2 binding-times of classic programming:
  • function (program) f composed at 'compile-time' ie programming-time
  • Called/executed as f(inputs) at run-time
can in fact be refined into infinitely many binding-times, an idea that has profound implications.  Eg an interpreter ℐ for a language L is something that takes a program in L and its input and gives the output:
ℐ : Program × Input → Output
Its curried type is Program → (Input → Output)
which is a compiler. So a compiler is a curried interpreter!
List comprehensions
SQL inside the programming language!
Data orientation
Switch statements?? Nah! We have dictionaries!
In general the idea that patterns of code can go into data-structures with mini-interpreters 
Non Determinacy
Do we prefer
z = 2*x + 3*y
Or
t1 = 2*x
t2 = 3*y
z = t1 + t2
Why is the first better? Not because its shorter but rather because it does not lay down an order – x first? Or y? – that does not need to be spelt out.
In general FP commits to being minimally determined, compared to imperative programming that is usually over-determined.

Whose next logical step is…
Multiple assignment
[Or unpacking assignment in python-speak]
For swapping two variables, do we prefer x,y = y,x or t=x;x=y;y=t
Once again if we think the first better because its shorter or even because it does not introduce a new temporary, that misses the point.  It's better because it leaves unspecified how exactly the swap should occur — ie  a question really of relevation.  In short multiple assignment is an important mid-point between pure FP and low level imperative programming (thanks to Dijkstra).
Expression oriented thinking
Started with APL and Squiggol but can be used anywhere. eg C:

int fact(int n) {return n==0 ? 1 : n*fact(n-1);}

That's functional programming in C – note the use of ?: rather than if-else.  Brings me to the next…
Recursion
Peter Deutsch said: To iterate is human, to recurse divine! Cute but unfortunate.  Here's a spot test. Consider the incantation:
     x = x + 1
If like most mathematicians, you recognize it as a nonsensical statement, you should not find recursion hard.  If however you think its 'perfectly ok' then you surely will!   For a mathematician, even a noob-mathematician, the equations
x= 1
xn+1 = x × xn
do not cause any problems. Why then is recursion so difficult for programmers? Because of imperative programming baggage… Recursion is intrinsically hard only because of the assumption that variables must be mutable and code is glued together with statements.  If variables were 'fix-bound' (ie single-assignment) and code was built up of a pure expression world, recursion would not seem that hard.  IOW the word variable means very different things to a mathematician and to an (imperative) programmer.  Recursion is hard only for those stuck with the second meaning.
For the importance and pervasiveness of recursion in CS see recursion 

That said, it also needs to be said that one of the most pervasive misunderstandings about FP amongst non-FPers is that FP is about using recursion for any and everything — A humorous take on that.  In case you get lost on that the preferred one is the last one by ‘tenured professor’.

So if recursion is central to FP but does not appear much in well-written functional programs, where does it go?
Enter…
Higher Order Programming
Most ¼ hour introductions to FP talk of map and filter as examples.
This is both right and wrong.
It is right because map and filter are good examples of use of FP techniques.
It is wrong because it misses the point, viz that map and filter are examples that when higher-order constructs are available how smooth it is to come up with abstractions of so-called ‘design patterns’
Flat guards instead of nested ifs
Comes from Dijkstra – hardly a functional programmer
Booleans as first-class
Write return p instead of if p return True else return False
Mathematical vs procedural thinking
The above collected into a coherent philosophy
Type inference (vs declaration)
Getting the hang of type inferencing like any other algebraic process, ie generically
Polymorphism
The OO aficionados, by swiping this word, have caused a lot of pickled brains. Fortunately the tide is turning and some of the biggest stalwarts in the OO camp are beginning to see the problem with this.
Idea of a universal data structure
The sexps of 1960s called XML in 20003
REPL: Getting serious about playing-around
Compiled languages tend to evolve philosophies of design, documentation, reviews and all such other serious, adult stuff – ie artificially organized creation.  A language with a read-eval-print-loop (REPL) encourages casualness, doing by playing-around – ie organic growth of s/w.
Introspection
A language with introspection and an REPL allows one to discover the language almost like one gets to know a person by talking and asking him/her questions. Learning by reading manuals etc seems by contrast, indirect and stilted.
Emacs
No I'm not talking of editing wizardry but of comint and derivatives, ie blurring the distinction between 'serious' code development (in files) and 'playful' experimentation (at the interpreter)
Lazy evaluation
Python programmers sometimes fret about what to call their generators. With a Haskell (like) background the problem vanishes – its just a lazy list. And spreadsheets have had lazy evaluation from the beginning*
Overloading (type) Classes
Many people think that Haskell's typeclasses, and the most famous of them all – Monads – are the best thing since sliced bread. While I dont belong to that camp, I still consider that:
  1. In their original avatar as 'overloading classes' they provide for a more structured and disciplined approach to overloading than the mess for which C++ is famous
  2. Likewise noting that the word 'class' coincides with the core credo of OOP, we may also notice that Haskell's typeclasses correspond more to Java's interfaces than classes.  Combine this with the fact that in OOP-land programming-against-an-interface is increasingly being preferred to class inheritance, Haskell's classes may actually be good for OOP pedagogy
And therefore like all the above, learning Haskell typeclasses can make you a better programmer even if you dont program in Haskell
All the above require to be expanded.
Also can the list be grown?

Footnotes



1. If this looks a bit politically incorrect, Ive seen an Introduction to C book start with He men smoke Marlboro cigarettes; he-programmers program in C
2. Twenty years ago garbage-collection would have been in the hype list. Now with Java, C#, Python etc its gone mainstream. So likewise this list may change with time.
3. Wadler has a spicy comment about the relation of XML and Sexp which I wont repeat here <wink>
* Suggested/reminded by Doug McIlroy. THE Doug McIlroy ! Thanks Doug for the kind word!


Updates

The above was written in 2012.  Subsequently…

Marko Rauhamma on the python list pointed out one which I had forgotten:

DSLs
Lisp showed how trivial it is to build – and more important, conceptualize – DSLs once one has a homoiconic base


In 2013 ACM published its new curriculum with FP in the absolute basic CS-core, perhaps as a resultant to the points noted above.

1 comment: