[Typo3-dev] help needed: charset tables (Asian languages)
Martin T. Kutschker
Martin.T.Kutschker at blackbox.net
Wed Mar 31 09:44:42 CEST 2004
Kasper Skårhøj wrote:
> I martin.
>
>
> In fact, the way I got hold of the gb2312 charset table was ... but
> requesting a URL at microsoft 256 times with a PHP script and parse the
> content... :-)
I see *sigh*
Anyway, I got hold onto some tables. I'll use the ones Microsoft
provides if available otherise what I've found elsewhere.
The trouble is that a) the Asian standards became all updated at least
twice (without a renaming) and b) there are no official mappings to
Unicode. The maping is done independently by the vendors (so diffeent
vendors => different mappings).
And what's more it seems that you cannot round-trip these charsets to
Unicode (there is a Microsoft issue article where they mention about 30
characters which do not round-trip per design).
I guess it will work out all right for most cases, but some "edge cases".
BTW, iso-2002-jp (JIS) is no fun. Conversion from JIS to Unicode is more
or leass easy, but Unicode to JIS requires some work. I'll postpone it
for now *). Anyone speaking Japanese who objects?
Masi
*) That is for "native" Typo3 support; mbstring, recode and iconv handle
these charsets.
More information about the TYPO3-dev
mailing list