[TYPO3] Strange problem with UTF-8 menus

Christopher Torgalson bedlamhotel at gmail.com
Mon Apr 28 13:56:28 CEST 2008


On Mon, Apr 28, 2008 at 12:01 PM, Mike Meir <mike at gateseven.co.uk> wrote:
> Hi
>
>  Indic script, including Tamil, are based on a syllabic structure.
>  Syllables are placed in linear order of sound, from left to right.
>  Within a syllable the elements may not appear in the order in which they
>  sound. Unicode text is stored in the order in which it (in principle)
>  sounds. In particular, dependent vowels (those which are attached to a
>  consonant) may appear above, below, to the left, to the right, or in two
>  parts surrounding the rest of a syllable. The problem is that for
>  Unicode text to appear correctly, at least some syllables need to be
>  reordered, and possibly extensively ligated.
>
>  Fonts which are used to display Indic Unicode text use open type
>  technology. An open type fonts contains glyphs for all the Unicode code
>  points, and glyphs for ligatures and combinations of characters,
>  together with tables which describe the Unicode equivalents of the
>  ligations and positioning of elements within syllables.
>
>  However, the instructions within a font only describe what needs to be
>  done, and only do so to a limited extent. To get text to display
>  correctly, a shaping engine is required, which performs the reordering
>  and ligation, and places elements correctly, before passing them onto
>  the rendering engine for display. In Windows, the shaping engine is
>  called usp10.dll.
>
>  This processing can in principle occur either on the server, or on the
>  client. Current versions of Windows support Unicode for (many) Indic
>  scripts, and are supplied with open type fonts, so, assuming the browser
>  in Unicode aware, Unicode text is reordered and displayed correctly by
>  the local operating system. If you use a legacy browser you may see the
>  text represented as Unicode code points, in the correct script, but
>  wrongly ordered.
>
>  However, for gif builder to work, you need a shaping engine on the
>  server, and the rendering application needs to be able to deal with the
>  output of the shaping engine. Probably both steps are missing from your
>  current set-up.
>
>  The reason why "local" encodings work is that they are based on misusing
>  the standard Windows code page, and the re-ordering is done by the
>  person or system that enters the text. The problem is that such text
>  appears to the client to be "European", and end users only see the text
>  correctly if they have the same fonts on their systems, or if the fonts
>  are delivered embedded in the pages. However, generating graphical
>  representations of text in this way is not a problem, since the system
>  thinks it is processing standard text, and users read the text from the
>  picture, not the underlying encoding.
>
>  Considerable work has been done in localising and internationalising
>  Linux, but I would guess that the same may not apply to ImageMagick. Try
>  googling Tamil Linux.


I don't know if it's exactly relevant, but Korean text (Hangul) which
is alphabetic-syllabic (i.e. alphabetic letters arranged into
syllables which are then placed along the basline like letters) *does*
work fine in with TYPO3/GIFBUILDER. There are few ligatures in Korean
though (some dipthongs are joined, but I don't know if they're
separate characters in the font or not)…


-- 
Christopher Torgalson
http://www.typo3apprentice.com/


More information about the TYPO3-english mailing list