Search This Blog

Tuesday, September 17, 2013

Haskell: From unicode friendly to unicode embracing

Doesn't λ x ⦁ x  :  α → α look better and communicate more clearly than \ a -> a :: a -> a  ?

What are the problems with the second (current Haskell) form?
  1. The a in the value world is the same as the a in the type world -- a minor nuisance and avoidable -- one can use different names
  2. λ looks like \
  3. The purely syntactic -> that separates a lambda-variable and its body is the same token that denotes a deep semantic concept -- the function space constructor
APL was one of the oldest programming language and is still one of the most visually striking.  It did not succeed because of various reasons, most notable of which is that it was at its heyday too long before unicode.

While APL is the first in using mathematical notation in programming, Squiggol, Bananas and Agda are more recent precedents in this direction.

In short, its time for programming languages to move from unicode-friendly to unicode-embracing

Some stray thoughts incorporating these ideas into Haskell.
should look like λ rather than \
Lexical cleanups
Likewise → looks better than ->
In 1990, when haskell was first conceived, the designers made a quick and cute solution to making certain essential distinctions. The haskell 98 report  explains that there are six namespaces in haskell:  variables, constructors, type variables, type constructors, type classes and module names.  However there are only two lexical distinctions: identifiers starting with a capital letter and with a lowercase letter.

Today, 20 years on
  • In a more i18n-ed world, we know that capital and lowercase is hardly a universal distinction
  • Experience of teaching haskell suggests that the visual/habitual difference between foo and Foo is just too miniscule considering the weight that haskell-semantics puts onto it.  Too often, too many beginners are tripped up by this:  Misspelling frobnicate as frobnicte gives a more helpful error message than misspelling foo as Foo.
    This is the reverse of what should be.
I suggest that given that unicode is a couple of orders of magnitude richer than ASCII, a visually less ambiguous choice be made to lexically distinguish fundamentally different classes of identifiers.

 For example…
Type variables
As with classic Hindley-Milner, use α, β for type variables.
[Anyone who's taught FP will know how hard it is for beginning students to not mix up the value-world and the type-world. A little lexical support would go a long way towards avoiding confusion]
Pun cleanups
  • The -> in the type a -> a and in the lambda-expression \a -> a is an unnecessary and unfortunate pun.
  • The completely useless and confusing pun between the singleton list [1] and the list type [Int] makes for a lot of student (and teacher!) grief. Likewise tuples.  Tuples is admittedly messy because (α × β) × γ  α × (β × γ) and  α × β × γ are different.  For lists however, following set theory which uses ℘ ℤ for set of sets of ℤ (integers), likewise I would prefer L Int or even better ℒ Int for lists of Int
  • Many other such lexeme-cleanups possible once unicode-embracing is accepted in principle. Isn't it about time that C's legacy && and || are replaced by ∧ ∨ ?
Use of more of the newly available parentheses for collection types
Disambiguate the module member referencing dot from the function composing dot.

And (yeah this is controversial) following Dijkstra, use a dot different from the the standard ASCII '.' for function application (see apply in SICP)

No comments:

Post a Comment