[TYPO3-UG India] Strange problem with Tamil in GMENUs

Rahul Dewan rahul at srijan.in
Thu May 1 09:38:40 CEST 2008


Here's a response from my colleague Dr. Mohanty, who's unfortunately not
subscribed to the TUG India list. Had forwarded the query to him, and
pasting his response here.

Thanks,
Rahul
--
http://www.srijan.in
http://blogs.srijan.in
http://www.srijanfoundartion.org

-------------------------------------------------------


> Hi,
> I'm trying to set up a site in Tamil but am having strange problems
> with the graphical menus.
[...]

From what I see from your description, there is nothing wrong with the
rendering engine, and the behaviour is as expected. You will need to
understand a little better how Indian languages work. Also, please be
warned that without access to someone who can read Tamil, setting up
a Tamil site will be very difficult.

> The problem seems to be that letters get swapped around. According to
> my understanding from studying websites about the Tamil alphabet these
> 2 letters actually form a kind of unit together. I can copy the
> combination தே in here but when I try to delete the second of the 2
> letters, it actually deletes the first (probably because the first one
> can't stand on its own).
[...]

Let us back up a little bit:
1. Some essentials: Unicode is an encoding, i.e., it specifies the
  positions of all characters in most languages of the world.
  UTF-8 is a particular representation of Unicode characters.
  Old-style fonts were TrueType, i.e., all that they included
  were glyphs to represent characters, with one glyph per
  character. OpenType fonts have embedded rules, which handle
  various issues like reordering, substitution, etc., besides
  a table of glyphs as in TrueType fonts. For any kind of
  sensible work with Indian languages, OpenType, Unicode fonts
  are a must.
2. You have two characters, the Tamil vowel sign ee, ே  (U0BC7), and
  the Tamil letter ta, த (U0BA4), which is a consonant. When put
  together, these form a single unit, with the vowel sign modifying
  the pronunciation of the consonant. A renderer should treat this
  as a single unit, so deletion, cursor movement, selection, etc.,
  should work on the entire unit, i.e., both characters. However,
  many renderers are broken, and handle these actions in incorrect
  ways. To the best of my knowledge, the free renderers, Pango, ICU,
  and the built-in one in QT work fine with Tamil, but I am not a
  Tamil expert. The most Unicode-compliant editor I have seen, though
  with a somewhat clunky interface, is Yudit (http://yudit.org). Yudit
  is cross-platform.
3. In a Unicode editor, a dependent vowel is always entered after the
  consonant. Some dependent vowels are reordered on display, i.e.,
  they will appear to the left of the consonant, though the entry
  and storage order are always consonant followed by dependent vowel.
  For an OpenType font, this reordering is done by rules embedded
  in the font file, and the renderer needs to be able to understand
  OpenType, and use its rules for reordering. In your example,
  the entry will be U0BA4 + U0BC7, which is also the storage order.
  However, on display, the order will be switched, as this particular
  dependent vowel stands to the left of the consonant.
4. In older TrueType fonts, it was up to the editor to figure out the
  proper ordering on entry, i.e., for some dependent vowels, it would
  be vowel followed by consonant, and in others it would be consonant
  followed by vowel. In your example, you would have to enter
  U0BC7 + U0BA4, and the storage and rendering order would be identical.
  This leads to many problems later on.
5. For Indian languages, I would completely abandon non-Unicode, and
  non-OpenType fonts. The font that you are using (or at least my
  version of it), TAMu_Kadampari.ttf, is actually not entirely Unicode-
  compliant, i.e., it has Tamil glyphs in the proper Unicode-defined
  positions, but also puts copies into the space for English. You
  should consider switching to the free Lohit Tamil font supplied by
  Redhat, which complies with both Unicode, and OpenType.

Regards,
Gora


More information about the Typo3-ug-india mailing list