Arabs are fortunate to have an alphabet that consists only of 28 letters (in comparison to Khmer language with 74 letters or Chinese language with their tricky character system). Every letter is assigned to a single key, and there are some keys left unused. But Arabic written language has its own unique challenges, unknown to Chinese.
- Positional letter forms. Arabs write words in italics so that the same letter can change its form depending on the position in a word. Put simply, at the end of a word the letter has a little “tail”, in the middle it has a “connection”, etc.
For example, a letter ? at the end of the word looks like ??, in the beginning— ??, in the middle — ???
“The Core” of the letter is с-formed «Horseshoe», at the end of which there can be added a stroke downwards or a connection stroke that creates a small loop.
- Connected writing (ligatures). Unlike in Latin, in which there are two standard ligatures ? and ? as well as several «optional» (e.g. fi, ff, ft), Arabic tradition of typography have literally a hundred of ligatures ranging from «traditional» (it is considered a mistake, if you write them separately) to «calligraphic» (they are connected to each other only in extremely elegant fonts).
For example, a combination ??? offends an Arabic eye: the symbols should curl into a loop, creating a ligature ??.
Another standard ligature is ???. It is not implemented on screen fonts, but in print a vertical line is always connected to a loop: ?
As an example of an exotic ligature can serve an Arabic word “meat”: the letters of the word ??? sometimes are not put into one line, but are written vertically, one after another. It reminds an ideographic writing style, don’t you think so?
Even for a ligature-phrase «may Allah bless and welcome him», an equivalent symbol is foreseen in Unicode: ?. This phrase (though in a quite confusing form) should be used after mentioning a prophet’s name.
- Diacritical symbols. Vowels are not shown in ordinary texts. But when you really need to indicate exact pronunciation (in dictionaries, student books, in an unknown or confusing word), vowels are identified by special marks above or under letters. The number of such signs can get as high as 10.
- Direction of the writing. By mixing an Arabic text with a text in Roman alphabet language, you create something that looks like total mess. The reason is the direction of writing that is changed all the time. For example, ???? ??? a link ??? ??? looks like three pieces, however all of them are highlighted when pointed by the mouse cursor, thus confirming that it is only one link and not three. (Try to highlight it with a mouse cursor — the highlighting will also be torn up into three pieces!)
- No upper case. On the other hand, unlike European alphabets, there are no upper case letters in Arabic language, that is why the space that upper case used to occupy can be used for other needs.
In accordance with the standardization of 1906 year, Arabic alphabet should consist of 470 ligatures. In 1945 a new standard was approved, according to which the number of ligatures was cut to 72: now a ligature does not correspond to a whole letter, but to a graphic element, e.g. a separate «horseshoe» or «tail». There are only several different tail forms for 28 letters that allows cutting a number of different ligatures. Moreover, it was decided to get rid of diacritics and majority of ligatures. It is important that a new standard was «backwards compatible»: you can easily get new ligatures out of the old ones just by dividing them into two pieces. You do not have to create new fonts – you could just upgrade the existing ones. When necessary, diacritics were added into a text manually.
The abridged standard was used as the basis for the Arabic typewritten font; the adaptation was necessary because a “tail” was usually printed under the letter, but in accordance with a machine-typing tradition, the letters were printed one after another in a line. That was probably a European definition of typography: a straight line of letters looking almost identical; but it stood in stark contrast to traditional typed or manually written texts, where a form and position of letters changed constantly, depending on the context.
The caret of a typing machine was moving from right to left, not allowing adding a text in the Latin alphabet. (Numbers were typed from right to left too.) «Cut-down» symbols (letters with tails, numbers and punctuation) filled all four rows of keyboards, in both letter cases.
There are numbers in upper case of the top row (ranging from 0 to 1 on the right up to 9 on the left); a tabulation is on the left side of numbers’ row; CapsLock is lower, Shift is even lower. On the right, there is a caret returning point (red) under Backspace; underneath there is Shift. The symbols in two letter cases when combined create a pair of “a letter without a tail and the same letter with a tail” for the majority of keys. It is worth mentioning, the position of punctuation symbols does not fully coincide on both keyboards.
It was only natural, therefore, that the first Arabic text processors as a basis took an Arabic typing machine keyboard layout and a corresponding character set. However, if it is still possible to do without the Latin alphabet in case of typing machines, it is hardly possible in case of computers; that is why the main issue that should be solved from the very beginning is creating a bi-lingual Latin-Arabic code system.
We noticed that a corresponding symbol in a code system DOS for Arabic (CP-864) for each ligature in an Arabic typing machine. They filled almost completely the upper (non-Latin) half of coding, leaving no room for traditional DOS pseudo-graphics. It is important to note that according to principles of a visual coding, the text itself is not coded – only its image (projection) on the screen. Even the symbols were typed from left to right: OS was not aware that a certain part of symbols was special, and made everything look identical on a screen. Obviously, that was “hell” for text editing software: even a search for the specified combination of letters in such a text turned out to be non-trivial.
The latter version of DOS-coding, CP-708, contains only one symbol for each corresponding Arabic letter, that is why there is a free place for pseudo-graphics and French additional letters, which is very useful for users living in Union du Maghreb, where the second official language is French. OS continues to display all of the symbols from left to right, but now it is also able to recognize combinations of neighboring Arabic letters and display them combined in a correct way. An Arabic text is written logically, every symbol corresponds to a letter – only backwards: from the end of the sentence towards a beginning. This means that it was necessary to “turn around” every input line, so that it could be shown on the screen.
In the meantime Listen to Arabic music as background! (a theme from Civilization IV: Warlords)
An Arabic keyboard layout came out of a typing machine layout. In a place, where a key in both letter cases had a single corresponding letter, the key was left. In the case where there were different letters for each key, one letter, where possible, was left. Released from an unnecessary burden, an upper case of layout was taken over by diacritics and punctuation. Unsurprisingly, Apple took another approach and left other letters on «controversial» keys; so that even the order of letters is different on their keyboards, not to mention punctuation.
It is curious that in a Microsoft layout a «traditional» (mentioned in the beginning of the text) ligature ?? was left; you can get it by clicking buttons of a pair of symbols ??+??, as if you clicked them one after another.
A Latin alphabet layout corresponds to a French AZERTY in the countries of Maghreb, and an American QWERTY is on the east of Arabic world:
There is 101 layout keyboard on the first photo, 102 layout is on the second.
There was created a new keyboard for Windows, absolutely incompatible with the Arabic coding standard CP-1256, although the keyboard layout was left the same. Just like in the prior coding standards, in CP-1256 Arabic letters were included together with French ones; besides new Windows typographic symbols were added, such as a long dash, a non-breaking space, etc.