Search This Blog


Saturday, April 19, 2014

Unicode in Python

1 Introduction

Python has been making long strides in embracing unicode. With python 3 we are at a stage where python programs can support unicode well however python program-source is still completely drawn from the ASCII subset of unicode.
Well… Actually with python 3 (not 2) this is already possible
def solvequadratic(a,b,c):
    Δ = b*b - 4*a*c
    α = (-b + sqrt(Δ))/(2*a)
    β = (-b - sqrt(Δ))/(2*a)
    return (α, β)

>>> solvequadratic(1,-5,6)
(3.0, 2.0)
Now to move ahead!

Why do we have to write x!=y then argue about the status of x<>y when we can simply write x≠y?
Or take a random example from the tutor list :
import math
print math.pi
print math.floor( 31.58889 )
print math.ceil( 31.58889 )
as compared to
print π
print 31.58889
print 31.58889

So we could say python is half-way towards becoming a full unicode language. To move in this direction can mean at least two things:
  1. Make python 'native' to other natural human languages
  2. Embrace the universal (ie mathematical) side of unicode more fully
1 is all about internationalization and localization. The writeup addresses only 2. It is given in the form of tables showing how current Ascii syntax could transform into a Unicode-embracing one.

The ideas came from a number of people on the python list – see references below.

2 Legend

Since most of the following is in the form of tables with current (Ascii) syntax juxtaposed with the more unicode-d one, it turns out that many of the comments on these pairs are similar and repetitious. To keep these tables neat, the repeating comments are spelt out first as under:

2.1 Math Space Advantage – MSA

One of the less noticed benefits of math (like) operators is that a math-op like + in program text is lexically unambiguous ie '+x' is two tokens + and x and not a single token composed of + and x. This is unlike alphanumerics where all the following being lexically different
for x in line:
for x inline:
forx in line:

makes spaces mandatory
We will see that moving to a more pervasively unicode form, makes many spaces that are currently inevitable, become unnecessary.
Below I will point such cases out with a 'MSA'. In some cases its technically required to have spaces, in others its just more aesthetic to have them. eg.
x in lst
cannot be written as
1 in [1,2,3]
can be written
However that's completely unreadable.
There's no such problem with
So replacing in by has a math-space-advantage (MSA). It also has the advantage of

2.2 Disambiguation – Dis

The in in for loops and in predicates have very different semantics conceptually; the latter is purely declarative, the former creates a binding. So having two unmixupable in⁠s is good for reducing confusions eg. for x ⬅ [1,2,3]:
and if x ∈ [1,2,3]:
IOW due to extreme scarcity of characters in Ascii, many characters have for generations been overloaded willy-nilly. As that scarcity becomes a thing of the past maybe we should avoid useless overloading? These cases are marked by Dis.

2.3 Name Space burden reduction

A (perfectly normal English) word like floor or ceiling cannot be put into the global (builtin) namespace because a programmer may want to use that name for usual or related connotation of floor/ceiling. For a symbols like ⌊,⌈ no such issue arises. Symbol NS

2.4 Unicode Choice – UC

In many (all?) cases unicode offers so much new variety that its not clear which choice to make. Such choices are indicated with UC

2.5 Font Issue – FI

When things are not looking exactly proper/pretty on my end and it seems to be a font issue, I'll mark a FI

3 Basic math

Ascii Unicode
2*pi*r 2×π×r FI
x!=y x≠y
x<=y x≤y
x>=y x≥y
q,r=divmod(a,b) q,r=a÷b 1
float(inf) NS
pow(2,4) 2⇑4 2
2**4 2⇑4 2
math.floor(3.5) ⌊3.5 NS
math.ceiling(3.5) ⌈3.5 NS
Python already has a large bunch of division related operators and functions: /, //, %, divmod. Given that quotient-together-with-remainder is a common integer arithmetic pattern, my preference is for ÷ to stand in for divmod. Other choices with their justifications are of course possible.
Are pow and ** the same?
Do x and × look the same? If yes, this is a problem and maybe * is just preferable?

4 Other basic Syntax

4.1 Assignment ←

Ascii Unicode
x = 1 x ← 1
x,y = y,x x,y ← y,x
x += y x +← y
If one could count the grief caused by thinking that = is math-equality – not just noobs but experienced C programmers who mistakenly put a = when they meant ==
The is not looking very nice out here (in different fonts): either too scrawny or to stubby. So...
While in an earlier version of this post I had used that for examples, I am (for now) reverting to good ol =

4.2 Attribute access →

Ascii Unicode
sys.argv[1] sys→argv[1]
(5).to_bytes(4,"little") 5→to_bytes(4,"little") Dis, MSA

4.3 in (predicate)

Ascii Unicode
1 in [1,2,3] 1 ∈ [1,2,3] MSA,FI
Most of the fonts Ive checked make the ∈ a little too large
I guess this should be treated as a transient problem – a fixable bug

4.4 in (for)

Ascii Unicode
for x in [1,2,3,4]: for x⬅[1,2,3,4]: MSA,UC
The sign could be any one of ⬅ ⇐ ⇦ ?
The two ins now disambiguated to ⬅ and ∈ should be a help to noobs

4.5 lambda λ

Ascii Unicode
lambda x: x+3 λx: x+3 MSA

5 Logic

Ascii Unicode
not x ¬x MSA
x and y x∧y MSA
x or y x∨y MSA

6 Collections

Sets, Bags and Lists (numpy arrays??) form a series. Having literals for all makes some succinct expressions possible

6.1 Lists

Ascii Unicode
[1,2]+[3,4] [1,2]⤚[3,4] Dis
List append is not symmetric (commutative).
The operator should reflect that fact.

6.2 Set theory

The most natural charecter for set literals is '{}' However given that
  • that is already taken by dicts
  • and dicts are more fundamental to programming than sets
⦃ ⦄ should be a good enough approx to conventional usage
Common set theory operators that mathematicians use ∈ ∉ ⊂ ⊃ ⊆ ⊇ ⊈ ⊉ ∪ ∩ ∅
Now unicode makes these available without any markuping
Ascii – OO forms Ascii – functional forms Unicode
set([]) set([])
s = set([1,2,3] s=⦃1,2,3⦄ MSA
t = set([2,3,4,5]) t=⦃2,3,4,5⦄
x in s x∈s MSA
x not in s x∉s MSA
s.issubset(t) s<=t s⊆t
??? s<t s⊂t
not s.issubset(t) not (s <= t) s⊈t 1,2,3
set([1]) <= set([2,1]) ⦃1⦄ ⊆ ⦃2,1⦄ 3
s.issuperset(t) s>=t s⊇t
s.union(t) s|t s∪t
s.intersection(t) s&t s∩t
s.difference(t) s-t s∖t FI,UC
s^t s∆t
s.update(t) s|t s∪=t
s&=t s∩=t
For numbers, not (x <= y)x>y This is not the case for sets. In somewhat incorrect! math jargon, <= is a total order whereas ⊆ is a partial order . Therefore is more needed than <=
The low precedence of not makes parentheses unnecessary but I find it confusing
While in general the OO form (column 1) is the most verbose, in these cases it is more readable than column 2
Are s\t and s∖t distinguishable? They dont look to me…
Unicode gives one of the names of ∆ as "symmetric difference".
Dont know of any natural/standard sign for difference (other than '-' '\' '/'). There are zillions of other symbols of course

6.3 Counter (bag/multiset)

Ascii Unicode
c = Counter(a=3, b=4) c = ⦉'a':3, 'b':4⦊ NS
d = Counter(a=1, b=2) d = ⦉'a':1, 'b':2⦊
c + d c ⊕ d
Counter({'b': 6, 'a': 4}) ⦉'a':4, 'b':6⦊
c & d c ∩ d
Counter({'a': 1, 'b': 1}) ⦉'a':1, 'b':1⦊
c | d c ∪ d
Counter({'a': 3, 'b': 4}) ⦉'a':3, 'b':4⦊
NS: Counter can only be used after from collections import Counter
Having to do this is an avoidable headache.
Not having to do this (in the current dispensation) entails a pollution of the global namespace)
Its another matter that Counter is an unfortunate name choice, given that
  • Bag/Multiset already exist and are well known
  • Counter already has more than many other established meanings in CS
Counter literals could be any out of ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
Note that list 'addition' (append) is not symmetric hence the asymmetric ⤚
Bag 'addition' is symmetric. The operator should reflect that.
UC: Which symbol to use? ⊕ or ⋄ ?
FI: The (in code) looks worse than the plain ⋄ out here

6.4 Casting

Python already has 'natural' casting (at the type level). Given
l = [1,2,3]
we can do
s = set(l)
c = Counter(l)
Literals even allow for use of the most 'natural' operators
Type Operation
Set ∪, ∩
with the general rule that the upper-row operators pull lower data upwards eg
ie order and repetition vanishes
[2,1,2]⊕[2,3,4,5]⦉1:1, 2:3, 3:1, 4:1, 5:1⦊
ie order vanishes, repetition maintained
Disambiguated literals makes natural casting possible:
x∪y expects x, y to be sets. What if they are not?? Simple – they are cast to sets
Likewise x⊕y expects x, y to be Counters. Else they are cast to counters
Presence of literals makes other things possible and natural, eg…

6.5 Comprehensions

Once we have literals for sets and bags we can have comprehensions for them:
Natural Comprehensions
Natural because both input and output collection are same
We can also have
Casting Comprehensions
ie the intention of the list-to-set cast is that order and repetition are discarded
Note: Many noob misunderstandings re comprehensions come from the clever pun – for in loops and in comprehensions. This removes that problem
UC: The │ (∣) is not the usual | (codepoint 9474 vs 124). It could be some other character – in addition to the ascii | there are │∣┃ ¦ │ (and probably more!!)

6.6 N-ary Operators

6.6.1 Examples

In mathematics there are a number of constructs like ∑, ∀ etc. They can be subsumed under the general concept of n-ary operators – aka generalized products.

6.6.2 Types

N-ary operators are complementary to comprehensions. If t is some type and C is one of set, Counter or list
can be thought to have type tC(t)
N-ary operators
can be thought to have type C(t)t

6.6.3 Correlations

N-ary operators are like reduce in that they generalize a binary to a collection.
N-ary operators are like lambda/comprehensions in that they imply a local binding
However there are issues. Consider for some arbitrary term t(x)
(∑ x∈⦃1,2,3⦄ : t(x))
= t(1) + t(2) + t(3)
However there is a catch: ⦃1,2,3⦄ == ⦃1,2,3,1,2⦄ [In standard python syntax set([1,2,3,1,2]) == set([1,2,3]) ]
That is, since sets contain elements whose repetition count is unspecified, the sum above is also t(1)+t(2)+t(3)+t(1)+t(2) or anything else!!
So clearly the appropriate collection for a ∑ is Counter, not set or list
In general, we see that for the n-ary operators we also have a natural collection over which they operate
Operator n-ary Natural Collection
+ Counter
× Counter
In general the principle is that for operators that are commutative and associative we use Counter. For operators that are idempotent as well we use Set.
Note that if an operator is not commutative and associative it has no meaningful n-ary. If it is, then list is over-specific; which is why we only find set and counter above.

7 Strings/Quoteds

Python has a menagerie of quoteds and unicode has a corresponding one of quote-like characters. How to match them I'm not really sure... Heres a start
Ascii Unicode
"Tom said \"Mary said \"Yoohoo!\"\"" «Tom said «Mary said «Yoohoo!»»»
r"a\nb" ‹a\nb›
u"हरि ॐ" ⟪हरि ॐ⟫
Note that whether » is one character or two is similar to the problem we have with quotes. Is '' a single double-quote or a double(d) single quote? Depending on the font this may be obvious or not
The above – so-called 'French-quotes' – seem to be widely used in languages other than French. German quotes however have some inconsistency problems.
Maybe code literals (compile, parser etc) ⟦ ⟧ following denotational semantics
code = compile('a + 5',...) code = compile(⟦a + 5⟧, ...)

[Seems neat in the context of Lisp or denotational semantics, not sure of python]

8 Long·Identifiers

There is also some evidence (?) suggesting that a-long-identifier is more readable than a_long_identifier is more readable than aLongIdentifier
The hyphenated option suffers from a severe ambiguity because hyphen and minus are the same letter…
… in Ascii only!
No More! Now we can write a·long·identifier
Well lisp and Cobol are exceptions but they incur their own heavy cost – math expressions cant be written naturally

9 is

Ascii Unicode
a is b a ≣ b
Or ≡ ?
The difficulties/noob-confusions of python's is should significantly reduce with this!

10 APL/Numpy integration

Ideas in numpy is largely lifted from APL.  Unicode makes it possible to carry (some of!) APL's lexemes as well. And not to go overboard in this and repeat APL's mistakes!
Ascii Unicode
array([2,3,4]) ⟨2,3,4⟩
range(10) ⍳10
a.shape ⍴a
a.reshape(2,3) a⍴(2,3)
take(a,2) a↑2
drop(a,2) a↓2
Numpy-array comprehensions
Advanced stuff – probably with inspiration from Alpha-Polyhedra

11 Questionable below

12 Keywords and Special Constants

Following Antoon's wish for def we could have ���������������������������������������������������� versions of the following keywords
and del for is raise
assert elif from lambda return
break else global not try
class except if or while
continue exec import pass yield
def finally in print
I personally consider more important to have ��������, ( �������� ?) ��, �� for True and False
I wonder about this
  1. Really mixing up fonts with characters seems like a bad idea (for programming). Why not colors? Sizes?…
  2. More generally most of the SMP seems like nonsense (to me) 
  3. Finally this does not seem to be working! So even if SMP is a good idea its probably not ready for general use (Trying numeric 핋 120139 dec or 핋 ie hex 1D54B )

13 Root

Ascii Unicode
sqrt(s) √x
Looks like poor over-specific syntax (to me) (But what do i know?!)

14 Operators

Large swathes of unicode's math-space could be available in operator which users (aka programmers) can choose to bind at will.
Given the experience of readability of APL this may be ill-advised… Maybe not – C++ devotees like the possibilities of overloading basic arithmetic operators.

15 References

15.1 Steven D'Aprano

  1. π (some other math symbols?) [Steven ?]
  2. (Problems with) ∑ for sum Steven 1
  3. Steven 2 example: was towards showing that something like this is undesirable:
  4. import ⌺
    ⌚ = ⌺.╩░
    ⑥ = 5*⌺.⋨⋩
    ❹ = ⑥ - 1
    ♅⚕⚛ = [⌺.✱✳**⌺.❇*❹{⠪|⌚.∣} forin ⌺.⣚]
    Somebody else pointed out that this is actually valid. Cant remember who and I certainly cant make this (as is) work. 
  5. That mathematicians used sets does not makes sets as fundamental in programming as dicts – [Steven ?] (so {} for dicts and something else for sets is ok)

15.2 Antoon Pardon

  1. · for ident separator (instead of '_') [Antoon ?]
  2. × for multiplication Antoon 2
  3. ⇑ for exponentiation [Antoon ?]
  4. → for attribute access Antoon 3
  5. ⤚ for list append Antoon 3
  6. bold (SMP) letters in identifiers [Antoon ?]

15.3 Mark Harris

  1. ∈ ∉ ∀ Δ Mark 1 Mark 2
  2. √ for sqrt Mark ?

Tuesday, September 17, 2013

Haskell: From unicode friendly to unicode embracing

Doesn't λ x ⦁ x  :  α → α look better and communicate more clearly than \ a -> a :: a -> a  ?

What are the problems with the second (current Haskell) form?
  1. The a in the value world is the same as the a in the type world -- a minor nuisance and avoidable -- one can use different names
  2. λ looks like \
  3. The purely syntactic -> that separates a lambda-variable and its body is the same token that denotes a deep semantic concept -- the function space constructor
APL was one of the oldest programming language and is still one of the most visually striking.  It did not succeed because of various reasons, most notable of which is that it was at its heyday too long before unicode.

While APL is the first in using mathematical notation in programming, Squiggol, Bananas and Agda are more recent precedents in this direction.

In short, its time for programming languages to move from unicode-friendly to unicode-embracing

Some stray thoughts incorporating these ideas into Haskell.

Tuesday, September 10, 2013

Computer Science: Technology or Philosophy?

A computer is like a violin. You can imagine a novice trying first a phonograph and then a violin. The latter, he says, sounds terrible. That is the argument we have heard from our humanists and most of our computer scientists. Computer programs are good, they say, for particular purposes, but they aren't flexible. Neither is a violin, or a typewriter, until you learn how to use it.
Marvin Minsky – Programming clarifies poorly-understood and sloppily-formulated Ideas

Computer science is not a science and it has little to do with computers. Its a revolution in the way we think and in the way we express what we think. The essence of this change is procedural epistemology — the study of the structure of knowledge from an imperative point of view, as opposed to the declarative point of view taken by math.
Mathematics provides a framework for dealing precisely with notions of «what is»
Computation provides a framework for dealing precisely with notions of «how to»

Abelson and Sussman — Structure and Interpretation of Computer Programs

Computer Science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes.
There is an essential unity of mathematics and computer science.

Michael Fellows — usually attributed to Dijkstra

The above three quotes are interesting as much in their agreement – the irrelevance of computers to computer-science – as in the difference of emphasis: Minsky sees CS from the intelligence/learning pov, Fellows/Dijkstra as math, Abelson/Sussman as something contrasting to math…

So what actually is CS about??

Following is an article I wrote for a newspaper in 1995 on the wider-than-mere-technology significance of CS — reposting here for historical interest.

Monday, September 9, 2013

The Poorest Computer Users are Programmers

In the old days programmers programmed computers. Period.

Nowadays when everything is a computer, and the traditional computer is about a decade and half behind the curve, describing a programmer as someone who programs computers is narrow and inaccurate. Instead we should think of programmers as working at effecting and improving the human-X interface, where X may be 'computer'. But it could also be IT, or technology or the network and through that last, interaction with other humans.

Now the classic 'nerdy' programmer was (by stereotype) always poor at 'soft' questions like that:  Interaction? Synergy?! What's all that manager-PR talk to do with programming?

And so today…

Programmers are inept as users of computers

Some examples:

Tuesday, August 27, 2013

Apply-ing SI on SICP

Abelson and Sussman wrote a legendary book: SICP. SICP cover The book has a famous wizard cover. Unfortunately the cover misses some key content of the book.  What is it?

If we remove the other wizardly stuff, three main artifacts stand out on that cover:  eval and apply on the crystal ball and a magical λ.  Lets put these into a table

apply eval

The fourth empty square seems to stand out, doesn't it?  Lets dig a little into this square.

Sunday, June 23, 2013

Functional Programming invades the Mainstream

Kewl-kids in love with their favorite language will often bring up how wonderful is some non-trivial app written in their language.

Kewl, Kewt, Ardent… And the producer of yawns…

So sometimes it is good to invert the perspective and ask about cross-fertilization:  What ideas/features of these fashionable languages are becoming compelling enough to enter the mainstream?

This post is about how the boring mainstream is giving in – feature-by-feature – to Functional Programming
  • Almost every modern language supports garbage collection. Origin Lisp
  • From that followed the fact that any value not just scalars can be first-class.
  • As widely disparate systems as Python, R, Groovy, VBA, Mathematica share a common idea – using the interpreter interactively as an exploratory tool. Started with Lisp's REPL.

Wednesday, May 1, 2013

Friday, April 26, 2013

Functional Programming Scratchbook

Concepts of FP – Mindmap

Please note this is a scratchbook, ie Work-in-progress
Lambda MindMap
A mind map of how to approach the concepts of FP

Saturday, February 2, 2013

C in Education and Software Engineering – Retrospective

Its more than 20 years ago that I wrote C in Edu and SE[1]  I had mostly forgotten about it until I saw Mahesh's review. So thanks Mahesh for your kind words.  The trouble is I dont exactly agree with myself from 22 years ago ;-)  You see even in 1991 what I was saying was that C is a stupid language to teach programming with – education – unlike say C++ which is a stupid language – period.

IOW what I was trying to say back then was that if learning to program is the goal, then a path that goes through C-land is going through bad-lands.  In short an argument not against C but against an ill-conceived learning-curve.

What has changed now?

Thursday, November 1, 2012

Imperative Programming: Lessons not learnt

We like to believe that Computer Science (or Information Technology) has advanced and keeps on advancing.

But has it?
What was called programming 60 years ago would today be called Imperative programming.  And it remains the mainstream (but see 7. below).

In short our field has a definite resistance to learning from our past.

A few examples will illustrate:

Thursday, October 18, 2012

Layout Imperative in Functional Programming

How long should program lines be?

But wait! Is this question even meaningful without specifying which programming language?

Monday, October 8, 2012

Functional Programming – the lost booty

Lisp was conceived in 1958 and already implemented by the early 60s.  One of its strange features was something called 'garbage-collection' … which took 35 years to enter the mainstream in Java.

Which is to say that for 35 years:
  • CS researchers did whatever they were doing for their tenure, (sorry) publications
  • Programming teachers righteously beat their students on their knuckles for getting pointer-errors/core-dumps/segfaults etc… 

Saturday, August 4, 2012

Functional Programming – Philosophical Difficulties

All the currently competing programming paradigms have serious philosophical problems.
FP probably less than the others but it too has its little share, which I deal with here.


Haskell programs look beautiful. You just write equations and everything magically just works. What could be a prettier dance between declaration (equations) and imperation (works)?

However the equations of Haskell hide a fundamental problem – equality is undecidable in general. Lets look at this from different angles.

Saturday, July 28, 2012

We dont need no Ooooo-Orientation – 4

The Grandeur of The Absolute

From the time – probably millenia ago – when humans first learnt to think ahead of their animal neighbours, we've been able to make certain statements that (presumably) animals can never conceive – abstract generalities.

So for example, a baby calf can recognize its mother cow with a greater unerring precision than a human baby's, yet when the human baby grows up, it can make distinctions out of the reach of our bovine brethren: eg
  • my mother vs motherhood
  • motherhood vs love
  • cheap love poetry vs hi-class love poetry
  • etc
In short, humans are very comfortable

dealing with abstractions as though they were concrete.

Now I have a conjecture, viz. that grand generalities have some hormonal trigger for making us feel elated (a grande-generality-pheromone maybe?) so that statements like
  1. Nothing in the universe can go faster than the speed of light
  2. Every pair of bodies in the universe attract each other according to a trivial-to-state mathematical law irrespective of their distance or relative size
  3. Anything that can be computed by any computer whatever (invented or yet to be invented) can be computed by a Turing machine
create a certain tickling feel-good that a 'normal' (non-general) statement like say: My tea has less sugar does not produce.

Friday, July 27, 2012

We dont need no Ooooo-Orientation – 3

In my earlier posts Ive discussed some context around why OO has been one of the more dismal failures in the history of IT/CS.
Here I talk of the error in thinking 'inheritance'.
And this gives the philosophical separation between those drawn to OOP and those not.

Before I come to the meat of the matter – why OO sucks – it would be good in all fairness, to deal with the

Very few successes of OOP