[TYPO3] Strange problem with UTF-8 menus
Christopher Torgalson
bedlamhotel at gmail.com
Mon Apr 28 13:56:28 CEST 2008
On Mon, Apr 28, 2008 at 12:01 PM, Mike Meir <mike at gateseven.co.uk> wrote:
> Hi
>
> Indic script, including Tamil, are based on a syllabic structure.
> Syllables are placed in linear order of sound, from left to right.
> Within a syllable the elements may not appear in the order in which they
> sound. Unicode text is stored in the order in which it (in principle)
> sounds. In particular, dependent vowels (those which are attached to a
> consonant) may appear above, below, to the left, to the right, or in two
> parts surrounding the rest of a syllable. The problem is that for
> Unicode text to appear correctly, at least some syllables need to be
> reordered, and possibly extensively ligated.
>
> Fonts which are used to display Indic Unicode text use open type
> technology. An open type fonts contains glyphs for all the Unicode code
> points, and glyphs for ligatures and combinations of characters,
> together with tables which describe the Unicode equivalents of the
> ligations and positioning of elements within syllables.
>
> However, the instructions within a font only describe what needs to be
> done, and only do so to a limited extent. To get text to display
> correctly, a shaping engine is required, which performs the reordering
> and ligation, and places elements correctly, before passing them onto
> the rendering engine for display. In Windows, the shaping engine is
> called usp10.dll.
>
> This processing can in principle occur either on the server, or on the
> client. Current versions of Windows support Unicode for (many) Indic
> scripts, and are supplied with open type fonts, so, assuming the browser
> in Unicode aware, Unicode text is reordered and displayed correctly by
> the local operating system. If you use a legacy browser you may see the
> text represented as Unicode code points, in the correct script, but
> wrongly ordered.
>
> However, for gif builder to work, you need a shaping engine on the
> server, and the rendering application needs to be able to deal with the
> output of the shaping engine. Probably both steps are missing from your
> current set-up.
>
> The reason why "local" encodings work is that they are based on misusing
> the standard Windows code page, and the re-ordering is done by the
> person or system that enters the text. The problem is that such text
> appears to the client to be "European", and end users only see the text
> correctly if they have the same fonts on their systems, or if the fonts
> are delivered embedded in the pages. However, generating graphical
> representations of text in this way is not a problem, since the system
> thinks it is processing standard text, and users read the text from the
> picture, not the underlying encoding.
>
> Considerable work has been done in localising and internationalising
> Linux, but I would guess that the same may not apply to ImageMagick. Try
> googling Tamil Linux.
I don't know if it's exactly relevant, but Korean text (Hangul) which
is alphabetic-syllabic (i.e. alphabetic letters arranged into
syllables which are then placed along the basline like letters) *does*
work fine in with TYPO3/GIFBUILDER. There are few ligatures in Korean
though (some dipthongs are joined, but I don't know if they're
separate characters in the font or not)…
--
Christopher Torgalson
http://www.typo3apprentice.com/
More information about the TYPO3-english
mailing list