[TYPO3-dev] RFC: Unicode with preg_replace

David Bruchmann typo3-dev at bruchmann-web.de
Tue Mar 23 21:08:03 CET 2010


Von:        Peter Russ <peter.russ at 4many.net>
Gesendet:   Dienstag, 23. März 2010 20:45:29

> --- Original Nachricht ---
> Absender: Martin Kutschker
> Datum: 23.03.2010 20:33:
>> David Bruchmann schrieb:
>>> And utf-8 is perfect for western languages but never for eastern ones.
>>
>> Because it uses three bytes for each glyph or for "political" reasons.
>> I have the impression that
>> all the major character sets have been included completely in the
>> Unicode character set.
>>
>> Masi
>
> And it depends on the definition of "western".
> @David: what is your approach for a site to support "Chinese", Japanese,
> Russian and German? Ok may work on BE and FE with character set
> switches. BUT what is the characterset for the DB? Latin 2 Bytes?
>

I've a page with multilanguage support including many different asian 
languages and I use only utf-8 (with auto-length, default one byte) 
because I grab my contents from Microsoft.
But I read many documents about charsets and know that a solution like 
that may display some characters wrong even if it's readable.
Furthermore it's not important what I think about any solution because I 
neither know speaking nor writing any asian languages. More important is 
to know that utf-8 isn't accepted by all people and until there is 
perhaps sometime a really global charset we've to live with different ones.
By the way: Just for displaying some african languages you have to 
download extra fonts where charset and font is nearly the same because 
fonts for those languages are rare. I haven't verified how characters 
are defined in those charsets but it shows again that utf-8 can't fit 
all requirements.
The word "western" I used shortly just as contrast to "eastern". It 
shouldn't have any defining character apart from separating languages 
with different charsets and requirements. Concerning hebrew or arabic 
languages I never read about any problems and I think they are accepted 
in utf-8.

David




More information about the TYPO3-dev mailing list