[TYPO3-english] TYPO3 and character encoding problems

Jigal van Hemert jigal.van.hemert at typo3.org
Thu Jul 10 11:10:09 CEST 2014


Hi,

On 9-7-2014 17:08, Pero Peric wrote:
> Character Č hex. UTF8 code is C48C. So because character_set_client was
> set to Latin1, this was not a code for UTF8 char but for Latin1 char. So
> it arrived to mysql as C48C Latin1. Because DB, table and fields were
> set to UTF8, conversion was made by mysql. C4 (what is in Latin1 hex.
> for Ä) became C384 (Ä in UTF8) and 8C (what is a control char in Latin1)
> became C28C (what is also some control char in UTF8 i suppose).

Interesting. This is a variation of what my script tests for. The most 
common problem is that the table is latin1 and utf-8 data is stored in 
it. Character Č is then displayed as Ä + control code if you look in the 
database.

What could be done is:
- first convert table and fields to latin1. C384 C28C => C48C
- second convert fields to binary data type and convert that to utf-8

The last step does two sneaky things. First MySQL will use the binary 
content and mark the column as a binary field. The second part lets 
MySQL interpret the binary data as utf-8 (without converting anything in 
the actual data).

The simpler version of my converter could be changed to make both 
changes. I'll have a look.

-- 
Jigal van Hemert
TYPO3 CMS Active Contributor

TYPO3 .... inspiring people to share!
Get involved: typo3.org


More information about the TYPO3-english mailing list