[TYPO3-UG India] Strange problem with Tamil in GMENUs
Rahul Dewan
rahul at srijan.in
Thu May 1 09:38:40 CEST 2008
Here's a response from my colleague Dr. Mohanty, who's unfortunately not
subscribed to the TUG India list. Had forwarded the query to him, and
pasting his response here.
Thanks,
Rahul
--
http://www.srijan.in
http://blogs.srijan.in
http://www.srijanfoundartion.org
-------------------------------------------------------
> Hi,
> I'm trying to set up a site in Tamil but am having strange problems
> with the graphical menus.
[...]
From what I see from your description, there is nothing wrong with the
rendering engine, and the behaviour is as expected. You will need to
understand a little better how Indian languages work. Also, please be
warned that without access to someone who can read Tamil, setting up
a Tamil site will be very difficult.
> The problem seems to be that letters get swapped around. According to
> my understanding from studying websites about the Tamil alphabet these
> 2 letters actually form a kind of unit together. I can copy the
> combination தே in here but when I try to delete the second of the 2
> letters, it actually deletes the first (probably because the first one
> can't stand on its own).
[...]
Let us back up a little bit:
1. Some essentials: Unicode is an encoding, i.e., it specifies the
positions of all characters in most languages of the world.
UTF-8 is a particular representation of Unicode characters.
Old-style fonts were TrueType, i.e., all that they included
were glyphs to represent characters, with one glyph per
character. OpenType fonts have embedded rules, which handle
various issues like reordering, substitution, etc., besides
a table of glyphs as in TrueType fonts. For any kind of
sensible work with Indian languages, OpenType, Unicode fonts
are a must.
2. You have two characters, the Tamil vowel sign ee, ே (U0BC7), and
the Tamil letter ta, த (U0BA4), which is a consonant. When put
together, these form a single unit, with the vowel sign modifying
the pronunciation of the consonant. A renderer should treat this
as a single unit, so deletion, cursor movement, selection, etc.,
should work on the entire unit, i.e., both characters. However,
many renderers are broken, and handle these actions in incorrect
ways. To the best of my knowledge, the free renderers, Pango, ICU,
and the built-in one in QT work fine with Tamil, but I am not a
Tamil expert. The most Unicode-compliant editor I have seen, though
with a somewhat clunky interface, is Yudit (http://yudit.org). Yudit
is cross-platform.
3. In a Unicode editor, a dependent vowel is always entered after the
consonant. Some dependent vowels are reordered on display, i.e.,
they will appear to the left of the consonant, though the entry
and storage order are always consonant followed by dependent vowel.
For an OpenType font, this reordering is done by rules embedded
in the font file, and the renderer needs to be able to understand
OpenType, and use its rules for reordering. In your example,
the entry will be U0BA4 + U0BC7, which is also the storage order.
However, on display, the order will be switched, as this particular
dependent vowel stands to the left of the consonant.
4. In older TrueType fonts, it was up to the editor to figure out the
proper ordering on entry, i.e., for some dependent vowels, it would
be vowel followed by consonant, and in others it would be consonant
followed by vowel. In your example, you would have to enter
U0BC7 + U0BA4, and the storage and rendering order would be identical.
This leads to many problems later on.
5. For Indian languages, I would completely abandon non-Unicode, and
non-OpenType fonts. The font that you are using (or at least my
version of it), TAMu_Kadampari.ttf, is actually not entirely Unicode-
compliant, i.e., it has Tamil glyphs in the proper Unicode-defined
positions, but also puts copies into the space for English. You
should consider switching to the free Lohit Tamil font supplied by
Redhat, which complies with both Unicode, and OpenType.
Regards,
Gora
More information about the Typo3-ug-india
mailing list