[TYPO3-english] TYPO3 and character encoding problems

Jigal van Hemert jigal.van.hemert at typo3.org
Sat Jul 12 15:59:24 CEST 2014


Hi,

On 12-7-2014 10:48, Pero Peric wrote:
> Now i really have to see what you do in script and how did you
 > manage to convert it.

Your diagnosis with the way Č was turned into C48C and then into C384 
C28C really helped.

First it converts C384 C28C into C48C simply by telling MySQL to convert 
the data from utf-8 into latin2 (latin1 will do to here I guess).
The next step is trickier, but I found an article on that years ago 
(when loads of 4.4 databases with utf-8 encoded text stored in latin1 
tables needed to be converted).
First you tell MySQL that the column type changed from a text type into 
the corresponding binary type; VARCHAR -> VARBINARY, TINYTEXT -> 
TINYBLOB, etcetera. MySQL does nothing to the data itself in this 
situation, but simply changes the column type. The beauty is that binary 
data types do not have a character set and collation.
After this you tell MySQL to change the data type into a text type with 
utf-8 character set; TINYBLOB -> TINYTEXT, etcetera. MySQL again does 
nothing to the data itself but simply marks that it has a character set 
and collation. If it reads the data it encounters C48C and interprets 
that as a Č.

You can do this manually, but with dozens of tables with a couple or 
dozens of text fields it becomes quite a task.

I'm happy that it worked and that you have a usable database.

-- 
Jigal van Hemert
TYPO3 CMS Active Contributor

TYPO3 .... inspiring people to share!
Get involved: typo3.org


More information about the TYPO3-english mailing list