[TYPO3] Strange problem with UTF-8 menus

Mike Meir mike at gateseven.co.uk
Mon Apr 28 12:01:17 CEST 2008


Hi

Indic script, including Tamil, are based on a syllabic structure.
Syllables are placed in linear order of sound, from left to right.
Within a syllable the elements may not appear in the order in which they
sound. Unicode text is stored in the order in which it (in principle)
sounds. In particular, dependent vowels (those which are attached to a
consonant) may appear above, below, to the left, to the right, or in two
parts surrounding the rest of a syllable. The problem is that for
Unicode text to appear correctly, at least some syllables need to be
reordered, and possibly extensively ligated.

Fonts which are used to display Indic Unicode text use open type
technology. An open type fonts contains glyphs for all the Unicode code
points, and glyphs for ligatures and combinations of characters,
together with tables which describe the Unicode equivalents of the
ligations and positioning of elements within syllables.

However, the instructions within a font only describe what needs to be
done, and only do so to a limited extent. To get text to display
correctly, a shaping engine is required, which performs the reordering
and ligation, and places elements correctly, before passing them onto
the rendering engine for display. In Windows, the shaping engine is
called usp10.dll.

This processing can in principle occur either on the server, or on the
client. Current versions of Windows support Unicode for (many) Indic
scripts, and are supplied with open type fonts, so, assuming the browser
in Unicode aware, Unicode text is reordered and displayed correctly by
the local operating system. If you use a legacy browser you may see the
text represented as Unicode code points, in the correct script, but
wrongly ordered.

However, for gif builder to work, you need a shaping engine on the
server, and the rendering application needs to be able to deal with the
output of the shaping engine. Probably both steps are missing from your
current set-up.

The reason why "local" encodings work is that they are based on misusing
the standard Windows code page, and the re-ordering is done by the
person or system that enters the text. The problem is that such text
appears to the client to be "European", and end users only see the text
correctly if they have the same fonts on their systems, or if the fonts
are delivered embedded in the pages. However, generating graphical
representations of text in this way is not a problem, since the system
thinks it is processing standard text, and users read the text from the
picture, not the underlying encoding.

Considerable work has been done in localising and internationalising
Linux, but I would guess that the same may not apply to ImageMagick. Try
googling Tamil Linux. 

Best Wishes



Mike Meir
Director, Gate Seven
www.gateseven.net


   



 

 


More information about the TYPO3-english mailing list