[TYPO3-english] TYPO3 and character encoding problems

Pero Peric pperic at mail.com
Thu Jul 10 11:54:35 CEST 2014


On 10.7.2014. 11:10, Jigal van Hemert wrote:
> Hi,
>
> On 9-7-2014 17:08, Pero Peric wrote:
>> Character Č hex. UTF8 code is C48C. So because character_set_client was
>> set to Latin1, this was not a code for UTF8 char but for Latin1 char. So
>> it arrived to mysql as C48C Latin1. Because DB, table and fields were
>> set to UTF8, conversion was made by mysql. C4 (what is in Latin1 hex.
>> for Ä) became C384 (Ä in UTF8) and 8C (what is a control char in Latin1)
>> became C28C (what is also some control char in UTF8 i suppose).
>
> Interesting. This is a variation of what my script tests for. The most
> common problem is that the table is latin1 and utf-8 data is stored in
> it. Character Č is then displayed as Ä + control code if you look in the
> database.
>
> What could be done is:
> - first convert table and fields to latin1. C384 C28C => C48C

Can you give me a hint how to do this conversion? Like mysqldump 
--default-character-set=latin1 or you have something other on your mind?

 > - second convert fields to binary data type and convert that to utf-8

Hm, this could be painy.. i suppose i need some script for this.

> The last step does two sneaky things. First MySQL will use the binary
> content and mark the column as a binary field. The second part lets
> MySQL interpret the binary data as utf-8 (without converting anything in
> the actual data).
>
> The simpler version of my converter could be changed to make both
> changes. I'll have a look.

Btw. i found one PHP script that does backup of database in pure PHP. I 
thought that this could work because it would be reverse process. 
Unfortunately it didn't work. Now i went further and wrote little PHP 
script to see how is data from TYPO3 pages table/title field displayed 
in browser. I thought that i would get right characters on the screen 
because TYPO3 4.4.0 displays them correctly. But i was wrong. I tried 
with set names utf-8, without, with latin1 but nothing worked. Only 
garbage chars are displayed. How the hell TYPO3 displays this correctly?

Regards.




More information about the TYPO3-english mailing list