1 Introduction
Python has been making long strides in embracing unicode.
With python 3 we are at a stage where python programs can support unicode well however python program-source is still completely drawn from the ASCII subset of unicode.
Well… Actually with python 3 (not 2) this is already possible
Now to move ahead!
Why do we have to write
Or take a random example from the tutor list :
So we could say python is half-way towards becoming a full unicode language. To move in this direction can mean at least two things:
The ideas came from a number of people on the python list – see references below.
List append is not symmetric (commutative).
The operator should reflect that fact.
=
However there is a catch:
That is, since sets contain elements whose repetition count is unspecified, the sum above is also
So clearly the appropriate collection for a ∑ is Counter, not set or list
In general, we see that for the n-ary operators we also have a natural collection over which they operate
In general the principle is that for operators that are commutative and associative we use Counter. For operators that are idempotent as well we use Set.
Note that if an operator is not commutative and associative it has no meaningful n-ary. If it is, then list is over-specific; which is why we only find set and counter above.
Or ≡ ?
The difficulties/noob-confusions of python's
Numpy-array comprehensions
Advanced stuff – probably with inspiration from Alpha-Polyhedra
Looks like poor over-specific syntax (to me) (But what do i know?!)
Well… Actually with python 3 (not 2) this is already possible
def solvequadratic(a,b,c): Δ = b*b - 4*a*c α = (-b + sqrt(Δ))/(2*a) β = (-b - sqrt(Δ))/(2*a) return (α, β) >>> solvequadratic(1,-5,6) (3.0, 2.0) >>>
Why do we have to write
x!=y
then argue about the status of x<>y
when we can simply write x≠y
?Or take a random example from the tutor list :
import math print math.pi print math.floor( 31.58889 ) print math.ceil( 31.58889 )as compared to
print π print ⌊31.58889 print ⌈31.58889
So we could say python is half-way towards becoming a full unicode language. To move in this direction can mean at least two things:
- Make python 'native' to other natural human languages
- Embrace the universal (ie mathematical) side of unicode more fully
The ideas came from a number of people on the python list – see references below.
2 Legend
Since most of the following is in the form of tables with current (Ascii) syntax juxtaposed with the more unicode-d one, it turns out that many of the comments on these pairs are similar and repetitious. To keep these tables neat, the repeating comments are spelt out first as under:
2.1 Math Space Advantage – MSA
One of the less noticed benefits of math (like) operators is that a math-op like
makes spaces mandatory
We will see that moving to a more pervasively unicode form, makes many spaces that are currently inevitable, become unnecessary.
Below I will point such cases out with a 'MSA'. In some cases its technically required to have spaces, in others its just more aesthetic to have them. eg.
cannot be written as
whereas
can be written
However that's completely unreadable.
There's no such problem with
So replacing
+
in program text is lexically unambiguous
ie '+x' is two tokens +
and x
and not a single token composed of +
and x
. This is unlike alphanumerics where all the following being lexically differentfor x in line:
… for x inline:
… forx in line:
… makes spaces mandatory
We will see that moving to a more pervasively unicode form, makes many spaces that are currently inevitable, become unnecessary.
Below I will point such cases out with a 'MSA'. In some cases its technically required to have spaces, in others its just more aesthetic to have them. eg.
x in lst
xinlst
1 in [1,2,3]
1in[1,2,3]
There's no such problem with
1∈[1,2,3]
in
by ∈
has a math-space-advantage (MSA).
It also has the advantage of
2.2 Disambiguation – Dis
The
and
IOW due to extreme scarcity of characters in Ascii, many characters have for generations been overloaded willy-nilly. As that scarcity becomes a thing of the past maybe we should avoid useless overloading? These cases are marked by Dis.
in
in for
loops and in predicates have very different semantics conceptually; the latter is purely declarative, the former creates a binding. So having two unmixupable in
s is good for reducing confusions
eg. for x ⬅ [1,2,3]:
… and
if x ∈ [1,2,3]:
IOW due to extreme scarcity of characters in Ascii, many characters have for generations been overloaded willy-nilly. As that scarcity becomes a thing of the past maybe we should avoid useless overloading? These cases are marked by Dis.
2.3 Name Space burden reduction
A (perfectly normal English) word like floor or ceiling cannot be put into the global (builtin) namespace because a programmer may want to use that name for usual or related connotation of floor/ceiling. For a symbols like
⌊,⌈ no such issue arises. Symbol NS
2.4 Unicode Choice – UC
In many (all?) cases unicode offers so much new variety that its not clear which choice to make.
Such choices are indicated with UC
2.5 Font Issue – FI
When things are not looking exactly proper/pretty on my end and it seems to be a font issue, I'll mark a FI
3 Basic math
Ascii | Unicode | |
---|---|---|
2*pi*r |
2×π×r |
FI |
x!=y |
x≠y |
|
x<=y |
x≤y |
|
x>=y |
x≥y |
|
q,r=divmod(a,b) |
q,r=a÷b |
1 |
float(inf) |
∞ |
NS |
pow(2,4) |
2⇑4 |
2 |
2**4 |
2⇑4 |
2 |
math.floor(3.5) |
⌊3.5 |
NS |
math.ceiling(3.5) |
⌈3.5 |
NS |
- 1.
- Python already has a large bunch of division related operators and functions: /, //, %, divmod. Given that quotient together with remainder is a common integer arithmetic pattern, and structured return values is much easier in python than in classic imperative languages like C, my preference is for
÷
to stand in fordivmod
. Other choices with their justifications are of course possible. - 2.
- Are pow and ** the same?
- FI
- Do
x
and×
look the same? If yes, this is a problem and maybe * is just preferable?
4 Other basic Syntax
4.1 Assignment ←
Ascii | Unicode |
---|---|
x = 1 |
x ← 1 |
x,y = y,x |
x,y ← y,x |
x += y |
x +← y |
- Dis
- If one could count the grief caused by thinking that
=
is math-equality – not just noobs but experienced C programmers who mistakenly put a=
when they meant==
… - FI
- The
←
is not looking very nice out here (in different fonts): either too scrawny or to stubby. So...
While in an earlier version of this post I had used that for examples, I am (for now) reverting to good ol=
4.2 Attribute access →
Ascii | Unicode | |
---|---|---|
sys.argv[1] |
sys→argv[1] |
|
(5).to_bytes(4,"little") |
5→to_bytes(4,"little") |
Dis, MSA |
4.3 in (predicate)
Ascii | Unicode | |
---|---|---|
1 in [1,2,3] |
1 ∈ [1,2,3] |
MSA,FI |
- FI
- Most of the fonts Ive checked make the ∈ a little too large
I guess this should be treated as a transient problem – a fixable bug
4.4 in (for)
Ascii | Unicode | |
---|---|---|
for x in [1,2,3,4]: … |
for x⬅[1,2,3,4]: … |
MSA,UC |
- UC
- The sign could be any one of ⬅ ⇐ ⇦ ?
- Dis
- The two ins now disambiguated to ⬅ and ∈ should be a help to noobs
4.5 lambda λ
Ascii | Unicode | |
---|---|---|
lambda x: x+3 |
λx: x+3 |
MSA |
5 Logic
Ascii | Unicode | |
---|---|---|
not x |
¬x |
MSA |
x and y |
x∧y |
MSA |
x or y |
x∨y |
MSA |
6 Collections
Sets, Bags and Lists (numpy arrays??) form a series.
Having literals for all makes some succinct expressions possible
6.1 Lists
Ascii | Unicode | |
---|---|---|
[1,2]+[3,4] |
[1,2]⤚[3,4] |
Dis |
The operator should reflect that fact.
6.2 Set theory
The most natural charecter for set literals is '{}' However given that
Common set theory operators that mathematicians use ∈ ∉ ⊂ ⊃ ⊆ ⊇ ⊈ ⊉ ∪ ∩ ∅
Now unicode makes these available without any markuping
- that is already taken by dicts
- and dicts are more fundamental to programming than sets
Common set theory operators that mathematicians use ∈ ∉ ⊂ ⊃ ⊆ ⊇ ⊈ ⊉ ∪ ∩ ∅
Now unicode makes these available without any markuping
Ascii – OO forms | Ascii – functional forms | Unicode | |
---|---|---|---|
set([]) |
set([]) |
∅ |
|
s = set([1,2,3] |
s=⦃1,2,3⦄ |
MSA | |
t = set([2,3,4,5]) |
t=⦃2,3,4,5⦄ |
||
x in s |
x∈s |
MSA | |
x not in s |
x∉s |
MSA | |
s.issubset(t) |
s<=t |
s⊆t |
|
??? | s<t |
s⊂t |
|
not s.issubset(t) |
not (s <= t) |
s⊈t |
1,2,3 |
set([1]). |
set([1]) <= set([2,1]) |
⦃1⦄ ⊆ ⦃2,1⦄ |
3 |
s.issuperset(t) |
s>=t |
s⊇t |
|
s.union(t) |
s|t |
s∪t |
|
s.intersection(t) |
s&t |
s∩t |
|
s.difference(t) |
s-t |
s∖t |
FI,UC |
s. |
s^t |
s∆t |
|
s.update(t) |
s|t |
s∪=t |
|
s. |
s&=t |
s∩=t |
- 1
- For numbers,
not (x <= y)
⇒x>y
This is not the case for sets. In somewhat incorrect! math jargon, <= is a total order whereas ⊆ is a partial order . Therefore⊈
is more needed than<=
- 2
- The low precedence of
not
makes parentheses unnecessary but I find it confusing - 3
- While in general the OO form (column 1) is the most verbose, in these cases it is more readable than column 2
- FI,UC
- Are
s\t
ands∖t
distinguishable? They dont look to me…
Unicode gives one of the names of ∆ as "symmetric difference".
Dont know of any natural/standard sign for difference (other than '-' '\' '/'). There are zillions of other symbols of course.
6.3 Counter (bag/multiset)
Ascii | Unicode | |
---|---|---|
c = Counter(a=3, b=4) |
c = ⟅'a':3, 'b':4⟆ |
NS |
d = Counter(a=1, b=2) |
d = ⟅'a':1, 'b':2⟆ |
|
c + d |
c ⊕ d |
|
Counter({'b': 6, 'a': 4}) |
⟅'a':4, 'b':6⟆ |
|
c & d |
c ∩ d |
|
Counter({'a': 1, 'b': 1}) |
⟅'a':1, 'b':1⟆ |
|
c | d |
c ∪ d |
|
Counter({'a': 3, 'b': 4}) |
⟅'a':3, 'b':4⟆ |
- NS
- Counter can only be used after
from collections import Counter
Having to do this is an avoidable headache.
Not having to do this (in the current dispensation) entails a pollution of the global namespace)
Its another matter thatCounter
is an unfortunate name choice, given that
- Bag/Multiset already exist and are well known
- Counter already has more than many other established meanings in CS
Bag 'addition' is symmetric. The operator should reflect that. - UC
- Which symbol to use? ⊕ or ⋄ ?
- FI
- The
⋄
(in code) looks worse than the plain ⋄ out here
6.4 Casting
Python already has 'natural' casting (at the type level). Given
we can do
Literals even allow for use of the most 'natural' operators
with the general rule that the upper-row operators pull lower data upwards
eg
ie order and repetition vanishes
ie order vanishes, repetition maintained
Disambiguated literals makes natural casting possible:
Likewise
Presence of literals makes other things possible and natural, eg…
l = [1,2,3]
we can do
s = set(l)
c = Counter(l)
Literals even allow for use of the most 'natural' operators
Type | Operation |
---|---|
Set | ∪, ∩ |
Counter | ⊕ |
List | ⤚ |
[1,2,3]∪[2,3,4,5]
⟹ ⦃1,2,3,4,5⦄
ie order and repetition vanishes
[2,1,2]⊕[2,3,4,5]
⟹ ⟅1:1, 2:3, 3:1, 4:1, 5:1⟆
ie order vanishes, repetition maintained
Disambiguated literals makes natural casting possible:
x∪y
expects x
, y
to be sets. What if they are not?? Simple – they are cast to sets Likewise
x⊕y
expects x
, y
to be Counters. Else they are cast to counters
Presence of literals makes other things possible and natural, eg…
6.5 Comprehensions
Once we have literals for sets and bags we can have comprehensions for them:
UC: The │ (∣) is not the usual | (codepoint 9474 vs 124). It could be some other character – in addition to the ascii | there are │∣┃ ¦ │ (and probably more!!)
- Natural Comprehensions
⦃x*x∣x⬅⦃1,2,3⦄⦄
⟹⦃1,4,9⦄
Natural because both input and output collection are same
We can also have- Casting Comprehensions
⦃x*x∣x⬅[1,2,1,3,1]⦄
⟹⦃1,4,9⦄
ie the intention of the list-to-set cast is that order and repetition are discarded
for
in loops and in comprehensions. This removes that problem
UC: The │ (∣) is not the usual | (codepoint 9474 vs 124). It could be some other character – in addition to the ascii | there are │∣┃ ¦ │ (and probably more!!)
6.6 N-ary Operators
6.6.1 Examples
In mathematics there are a number of constructs like ∑, ∀ etc. They can be subsumed under the general concept of n-ary operators – aka generalized products.
6.6.2 Types
N-ary operators are complementary to comprehensions. If
t
is some type and C
is one of set
, Counter
or list
- Comprehensions
- can be thought to have type
t
→C(t)
- N-ary operators
- can be thought to have type
C(t)
→t
6.6.3 Correlations
- Reduce
- N-ary operators are like reduce in that they generalize a binary to a collection.
- Lambda
- N-ary operators are like lambda/comprehensions in that they imply a local binding
t(x)
(∑ x∈⦃1,2,3⦄ : t(x))
=
t(1) + t(2) + t(3)
However there is a catch:
⦃1,2,3⦄ == ⦃1,2,3,1,2⦄
[In standard python syntax set([1,2,3,1,2]) == set([1,2,3])
]
That is, since sets contain elements whose repetition count is unspecified, the sum above is also
t(1)+t(2)+t(3)+t(1)+t(2)
or anything else!!
So clearly the appropriate collection for a ∑ is Counter, not set or list
In general, we see that for the n-ary operators we also have a natural collection over which they operate
Operator | n-ary | Natural Collection |
---|---|---|
+ | ∑ | Counter |
× | ∏ | Counter |
∧ | ∀ | Set |
∨ | ∃ | Set |
⊕ | ⨁ | Counter |
∪ | ⋃ | Set |
∩ | ⋂ | Set |
Note that if an operator is not commutative and associative it has no meaningful n-ary. If it is, then list is over-specific; which is why we only find set and counter above.
7 Strings/Quoteds
Python has a menagerie of quoteds and unicode has a corresponding one of quote-like characters. How to match them I'm not really sure... Heres a start
Note that whether » is one character or two is similar to the problem
we have with quotes. Is '' a single double-quote or a double(d) single quote?
Depending on the font this may be obvious or not
The above – so-called 'French-quotes' – seem to be widely used in languages other than French. German quotes however have some inconsistency problems.
Maybe code literals (compile, parser etc) ⟦ ⟧ following denotational semantics?
[Seems neat in the context of Lisp or denotational semantics, not sure of python]
Ascii | Unicode |
---|---|
"Tom said \"Mary said \"Yoohoo!\"\"" |
«Tom said «Mary said «Yoohoo!»»» |
r"a\nb" |
‹a\nb› |
u"हरि ॐ" |
⟪हरि ॐ⟫ |
The above – so-called 'French-quotes' – seem to be widely used in languages other than French. German quotes however have some inconsistency problems.
Maybe code literals (compile, parser etc) ⟦ ⟧ following denotational semantics?
Ascii | Unicode |
---|---|
code = compile('a + 5',...) |
code = compile(⟦a + 5⟧, ...) |
[Seems neat in the context of Lisp or denotational semantics, not sure of python]
8 Long·Identifiers
There is also some evidence (?) suggesting that
The hyphenated option suffers from a severe ambiguity because hyphen and minus are the same letter…
… in Ascii only!
No More! Now we can write
Well lisp and Cobol are exceptions but they incur their own heavy cost – math expressions cant be written naturally
a-long-identifier
is more readable than a_long_identifier
is more readable than aLongIdentifier
The hyphenated option suffers from a severe ambiguity because hyphen and minus are the same letter…
… in Ascii only!
No More! Now we can write
a·long·identifier
Well lisp and Cobol are exceptions but they incur their own heavy cost – math expressions cant be written naturally
9 is
Ascii | Unicode |
---|---|
a is b |
a ≣ b |
The difficulties/noob-confusions of python's
is
should significantly reduce with this!
10 APL/Numpy integration
Ideas in numpy is largely lifted from APL. Unicode makes it possible to carry (some of!) APL's lexemes as well. And not to go overboard in this and repeat APL's mistakes!
Ascii | Unicode |
---|---|
array([2,3,4]) |
⟨2,3,4⟩ |
range(10) |
⍳10 |
a.shape |
⍴a |
a.reshape(2,3) |
a⍴(2,3) |
take(a,2) |
a↑2 |
drop(a,2) |
a↓2 |
Advanced stuff – probably with inspiration from Alpha-Polyhedra
11 Questionable below
12 Keywords and Special Constants
Following Antoon's wish for def we could have
𝗮𝗯𝗰𝗱𝗲𝗳𝗴𝗵𝗶𝗷𝗸𝗹𝗺𝗻𝗼𝗽𝗾𝗿𝘀𝘁𝘂𝘃𝘄𝘅𝘆𝘇 versions of the following keywords
I personally consider more important to have
I wonder about this
and | del | for | is | raise |
assert | elif | from | lambda | return |
break | else | global | not | try |
class | except | if | or | while |
continue | exec | import | pass | yield |
def | finally | in |
𝐍𝐨𝐧𝐞
, ( 𝗡𝗼𝗻𝗲
?) 𝕋
, 𝔽
for True and False
I wonder about this
- Really mixing up fonts with characters seems like a bad idea (for programming). Why not colors? Sizes?…
- More generally most of the SMP seems like nonsense (to me)
- Finally this does not seem to be working! So even if SMP is a good idea its probably not ready for general use (Trying numeric 핋 120139 dec or 핋 ie hex 1D54B )
13 Root
Ascii | Unicode |
---|---|
sqrt(s) |
√x |
14 Operators
Large swathes of unicode's math-space could be available in operator
which users (aka programmers) can choose to bind at will.
Given the experience of readability of APL this may be ill-advised… Maybe not – C++ devotees like the possibilities of overloading basic arithmetic operators.
Given the experience of readability of APL this may be ill-advised… Maybe not – C++ devotees like the possibilities of overloading basic arithmetic operators.
15 References
15.1 Steven D'Aprano
- π (some other math symbols?) [Steven ?]
- (Problems with) ∑ for sum Steven 1
- Steven 2 example: was towards showing that something like this is undesirable:
- That mathematicians used sets does not makes sets as fundamental in programming as dicts – [Steven ?] (so {} for dicts and something else for sets is ok)
import ⌺ ⌚ = ⌺.╩░ ⑥ = 5*⌺.⋨⋩ ❹ = ⑥ - 1 ♅⚕⚛ = [⌺.✱✳**⌺.❇*❹{⠪|⌚.∣} for ⠪ in ⌺.⣚] ⌺.˘˜¨´՛՜(♅⚕⚛)
No comments:
Post a Comment