## Tuesday, August 27, 2013

### Apply-ing SI on SICP

Abelson and Sussman wrote a legendary book: SICP. The book has a famous wizard cover. Unfortunately the cover misses some key content of the book.  What is it?

If we remove the other wizardly stuff, three main artifacts stand out on that cover:  eval and apply on the crystal ball and a magical λ.  Lets put these into a table

 apply eval lambda

The fourth empty square seems to stand out, doesn't it?  Lets dig a little into this square.

What is the apply-eval axis? Its what defines the interpreter. SICP outlines it thus:
1. To evaluate a combination (a compound expression other than a special form), evaluate the subexpressions and then apply the value of the operator subexpression to the values of the operand subexpressions.
2. To apply a compound procedure to a set of arguments, evaluate the body of the procedure in a new environment. To construct this environment, extend the environment part of the procedure object by a frame in which the formal parameters of the procedure are bound to the arguments to which the procedure is applied.
And expressed in the mystical diagram that expands on the cover.

So much for the Eval-Apply duality.

We now turn our attention to

### The Apply-Lambda axis

The relation in scheme is very simple and pretty.  For any suitable form f :
(apply (lambda () f )'())   ≡  f
Likewise
(lambda () (apply f '()))  ≡ f

In time-terms, λ delays evaluation of f, which apply forces. Thus:

### apply and lambda are inverse operations

So now we understand a little better what we need to find to fill the 4th sub-square of our square – it needs to be the inverse of eval.

 apply eval lambda quote
And this as every lisper (including of course Abelson and Sussman!) understand is so important that it is short-formed to a single character – the single-quote mark. Which gives us the pattern.

And once we have this table we can ask meanings on the patterns it suggests.  We've seen that apply-eval is a dual pair as are apply-lambda and eval-quote.

The operational view is quite useful in this case, so we start with that.
If we look at a typical implementation of λ-calculus like the SECD-machine, we would find that the behavior (semantics) of quote and lambda are quite similar:
• quote just returns its argument, ie it prevents eval from proceeding further
• lambda does almost the same except that it first captures the environment
So the default behavior of eval – recursively entering the expression has two exceptions – lambda and quote. IOW lambda and quote are kinds of delays.[1]

At the paradigm level, there are two modes of higher-order-ness for programmers – the FP mode using β-reduction in λ-calculus. And the 'meta-mode' used by most other non higher-order languages:

### Iterate the use-mention distinction on a universal data-type

Some examples
1. In a vanilla language (like C) the 'universal data type' is string.  It is universal in that any program is written as a text file which is nothing but a persistent form of string.  Now a higher-order intention – parsing – is realized with a tool like yacc program-generating over that universal data-type.
By contrast a higher order language like haskell can do it much more elegantly using the λ-calculus machinery of combinators eg parsec.
2. Or consider the classic von Neumann machine.  The memory – array of bytes – is the universal data-structure that holds the data and the code.  This is the trick that makes computers tick!  An idea that von Neumann probably picked up from…
3. Gödel
• The original proof of Gödel's theorem used integers as the universal data-type
• Subsequently (after computers became normal) it became clear that string is more amenable for this purpose[2].
• And lisp s-expressions more amenable still
lambda
Every language, starting with Fortran and C have functions. Functional languages are the only ones that provide a specific designated syntax for functions – λ.  Once that's done, the function apply, spelling out the operational semantics of the λ, just follows naturally.
quote
Likewise every Turing-complete language by definition allows for the writing of an evaluator.  What makes Lisp special is not eval but its homoiconicity viz. that the primitive data-structure of the language – S-exp – subsumes the language's syntax[3] — a key aspect of which is quote.
And thence we see why Scheme is such a stupendous language. It has the full higher-order capability of functional programming – lambda-apply.  At the same time it also has an elegant use-mention higher-order-ness – eval-quote – a double-barreled power which seems to have been missed even by its creators[4].

In summary
 Computation (Time Domain) Value Domain Semanticize⇑ Syntaxify   ⇓ forcer Apply Eval Structure Domain delay Lambda Quote

Now with Scheme, its only the art-work of the cover that misses on conveying its power.  With other…

### Modern functional languages

like Haskell, ML, F# etc, the lacuna is more deeply embedded in the (lack of) the languages' semantics and syntax.

The semantic lack is clear – no quote-eval.  This is of course a deliberate choice because eval can cause more problems than it solves. With this choice I have no quarrel.

But there is also a syntactic lack that is logically unsound.

In Church's λ-calculus, the whole world is constructed out of two operations – application and abstraction.  Of these two, abstraction was explicitly demarcated with a λ but function application had no indicator – it was just implied by juxtaposition.

And thereafter every modern FPL has imitated Church including this improper omission. IOW every modern FPL (and increasingly even non FPLs[5]) sport a lambda syntax. However none of them has an apply syntax — a pedagogically unfortunate choice since a factor contributing to the proverbial hardness of functional programming is that the key semantic concept is syntax-absent.[6]

The exceptions are Scheme, Dijkstra and Pugofer.  Apply's central place in  Scheme I've already alluded to above.

### Dijkstra

came to this from a math/theory angle and started the practice making the apply explicit with a '.' in his math. ie where a normal mathematician would write f(x), Dijkstra[7] started writing f.x . His insight lay in juxtaposing two principles – one philosophical, one mathematical – that are rarely seen together.
Philosophical principle: Leibniz Identity of Indiscernibles
Entities x and y are identical if every predicate possessed by x is also possessed by y and vice versa
Mathematical definition: Extensional equality of functions
Two functions are considered to be extensionally equal if they both yield the same results on all inputs.
Dijkstra's observations (given in his characteristic laconic style):
1. The two seemingly disparate principles can be visibly seen to be dual when put into symbolic form:
Leibniz
x = y  ⇒  (f  |  f.x = f.y)
Extensionality
f = g    (x  |  f.x = g.x)

The above exhibits a beautiful duality of equality and application but needs a symbol for both to become apparent
2. In mathematics, it was Roberte Recorde who introduced the '=' symbol for equality and thereby advanced the medieval practice of mathematics to the modern calculational form.
3. In philosophy it was Leibniz who first promulgated the ideology of doing philosophy calculationally, an attitude furthered by Boole who lifted the calculational style of math into logic.
4. The dot is needed to complete the circle of integrating philosophy, logic mathematics and computer science.
In short, Dijkstra was a true blue-blooded functional programmer which inspired me to make…

### Pugofer

which was built on the gofer foundation created by Mark Jones and made Dijkstra's apply-dot (along with some other syntax changes[8]) accessible to a generation of programmers. Is this is your cup of tea???  Check out the gofer prelude with application made explicit.

### Acknowledgements

• Aditya Godbole, CTO Vertis, for asking me to teach gofer to his office folks and thus reminding me of my 20-year old cogitations
• Jayant Kirtane for consistently and insistently claiming that the dot is a wonder
• hvs for teaching me lisp (and too much else to list)

[1] In a D-flip-flop the D can stand for 'data' or 'delay'
[2] The proverbial difficulty of Gödel's theorem is not so much in the idea of the theorem as the technology.  Think of a complex data structure in a computer's memory covering thousands of bytes which we grok in terms of structs/arrays/pointers etc. Now instead of these how would it be to understand it as one humonguous number spanning 1000s of bytes?? Encoding theorems as integers using the Chinese remainder theorem is a software engineering nightmare.
[3] Turing-complete macros make for an interesting question: are they in the first or the second category?  The mechanism is the second, the usage patterns are akin to the first
[4] At any rate the artist who drew SICP's cover!
[5] Javascript... C++ is the latest
[6] Ive taken the liberty of explicating Dijkstra's motivations for the apply-dot
Thought Dialogue with EWD and more incompletely Doting on the dot-1 and Doting on the dot-2
[7] See Notational Conventions and Predicate Calculus and Gries&Schneider
[8] Notes on Notation