Unicode starts out with the realization that ASCII is ridiculously restrictive, or the world is larger than the two sides of the Atlantic¹. This gives rise to all the blocks from Arabic to Zhuang.
However the greatest promise of unicode lies not in catering to this tower of babel but rather in those areas that are more universal. Yeah I know technically this distinction between universal and international will not stand up to scrutiny.
Different people will want to classify any given character as babel or universal differently. Nevertheless the distinction is important and the world can be a better place for the making of it.
Below is a first stab at my choice of the universal side of unicode.
Prompted by Dave Angel's neat list of historical examples showing how poverty (in this case of 7 bits) as a natural way of life is not necessarily a positive attitude.
(Although) I'm a native English speaker, 7 bits is not nearly enough. Even if I didn't currently care, I have some history:
No. CDC display code is enough. Who needs lowercase?
No. Baudot code is enough.
No, EBCDIC is good enough. Who cares about other companies.
No, the "golf-ball" only holds this many characters. If we need more,
we can just get the operator to switch balls in the middle of printing.
No. 2 digit years is enough. This world won't last till the millennium
anyway.
No. 2k is all the EPROM you can have. Your code HAS to fit in it, and
only 1.5k RAM.
No. 640k is more than anyone could need.
No, you cannot use a punch card made on a model 26 keypunch in the same
deck as one made on a model 29. Too bad, many of the codes are
different. (This one cost me travel back and forth between two
different locations with different model keypunches)
No. 8 bits is as much as we could ever use for characters. Who could
possibly need names or locations outside of this region? Or from
multiple places within it?
2 Math
2.1 Basic
2.1.1 set theory
x ∈ A, A ⊆ B, A ∩ B, A ∪ B, ∅
2.1.2 Logic
∧ ∨ ¬ ∃ ∀
2.1.3 Standard Sets
ℕ ℤ ℚ ℝ ℂ
2.1.4 n-arys
∑ ⋂ ⋃
2.1.5 Various
∞ ±
2.1.6 APL, Z
APL and Z Notation are two notable languages APL is a programming language and Z a specification language that did not tie themselves down to a restricted charset even in the day that ASCII ruled.
Yeah I know many people think that APL 'failed' because it was 'too mathematical.' Maybe they should reflect on whether between
23 + 45 = 68
and
twenty-three plus forty-five is sixty-eight
which is more perspicuous.
Or more simply why Cobol is not more popular.
Yeah I know many people think that APL 'failed' because it was 'too mathematical.' Maybe they should reflect on whether between
23 + 45 = 68
and
twenty-three plus forty-five is sixty-eight
which is more perspicuous.
Or more simply why Cobol is not more popular.
2.2 Arrows
← → ↑ ↓ ⇒ ⇄ and zillions more
2.3 Brackets
Those who started with Fortran may remember how much trouble was caused both to programmers and language implementers because arrays and functions were indistinguishable – all because at that time there was only '()', no [] or {}.
However the acute worldwide shortage of brackets has not ended with Fortran. ASCII provides nothing more than '[{(' and their r-counterparts. Which means that every language has to invent its own ad hoc collection data-structures.
Now we have ⟦ ⟧ ⟨ ⟩ ⟪ ⟫ ⟮ ⟯ ⟬ ⟭ ⌈ ⌉ ⌊ ⌋ ⦇ ⦈ ⦉ ⦊ ⟅ ⟆ and much more
However the acute worldwide shortage of brackets has not ended with Fortran. ASCII provides nothing more than '[{(' and their r-counterparts. Which means that every language has to invent its own ad hoc collection data-structures.
Now we have ⟦ ⟧ ⟨ ⟩ ⟪ ⟫ ⟮ ⟯ ⟬ ⟭ ⌈ ⌉ ⌊ ⌋ ⦇ ⦈ ⦉ ⦊ ⟅ ⟆ and much more
2.4 Sub/superscripts
x¹ y² z³ a₁ b₂ c₃
2.5 Greek and Math-Greek
Its hard to imagine doing math without the familiar
α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ τ υ φ χ ψ ω
See also math-greek blocks
α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ τ υ φ χ ψ ω
See also math-greek blocks
3 Typography
3.1 IPA phonetcs
such as ɐ ə ɘ, see
3.2 Quotes
Some European languages use «»‹› for quote marks. Also they are not consistent. eg sometimes ‹ may be an open and sometimes a close. IOW technically quotes should be in the babel and not the universal part of unicode.
However experience with programming language design shows that two quote-marks are way too impoverished. eg python needs single, double, triple-single, triple-double, raw, unicode and all sorts of combinations of these.
More examples here
However experience with programming language design shows that two quote-marks are way too impoverished. eg python needs single, double, triple-single, triple-double, raw, unicode and all sorts of combinations of these.
More examples here
3.3 Typography
- Space ␢
- Para ¶
- Section §
- Return ⏎
- Distinguish hyphen ‐, dashes (‒, –, —, ―) and minus −
Rather than overloading all onto the one ASCII - - And finally the 'replacement-char' (unicode-goofup) �
5 Iconic
The below is much more iffy. Some will think them nonsense. Some will base their life on them! Personally, my feeling is that things that are random icons but are not language-ish in their own right should not be in a standard like unicode.
However they are! So lets use them!
However they are! So lets use them!
5.1 Whimsical
When going from the original 2-byte unicode (around version 3?) to the one having supplemental planes, the unicode consortium added blocks such as
To me – a unicode-layman – it looks unprofessional… Billions of computing devices world over, each having billions of storage words having their storage wasted on blocks such as these?? Seems whimsical (if you ask me).
To me – a unicode-layman – it looks unprofessional… Billions of computing devices world over, each having billions of storage words having their storage wasted on blocks such as these?? Seems whimsical (if you ask me).
5.2 Astrology
Planets ☿ ♀ ♁ ♂♃♄♅ ♆ ♇
Zodiac ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓
5.3 Cards and Chess
eg ♠ ♥ ♦ ♣ ♔ ♕
block
block
5.4 Traffic and maps
eg ⚠ ☡ ✈ ✆
see
see
5.5 Emoji
Blogger is currently barfing on these 😁 😞 😠 – as with all things SMP. But we know what these are: Emoji
5.6 Music
As far as I can see putting music ♩ ♪ ♫ ♬ into unicode is one of
- iconic
- crazy
- I dont get it
5.7 Cultural, Religious, Ecological
✡ ☪
☯ ♥
✟ ॐ
☭ 卐
♲ ☢
¹ Neglecting the years between the creation of ASCII and Unicode; also called CodePage hell
☯ ♥
✟ ॐ
☭ 卐
♲ ☢
Acknowledgments
Drawn largely from Xah Lee's excellent Unicode pages.¹ Neglecting the years between the creation of ASCII and Unicode; also called CodePage hell
No comments:
Post a Comment