[TYPO3-english] TYPO3 and character encoding problems
Jigal van Hemert
jigal.van.hemert at typo3.org
Sat Jul 12 15:59:24 CEST 2014
Hi,
On 12-7-2014 10:48, Pero Peric wrote:
> Now i really have to see what you do in script and how did you
> manage to convert it.
Your diagnosis with the way Č was turned into C48C and then into C384
C28C really helped.
First it converts C384 C28C into C48C simply by telling MySQL to convert
the data from utf-8 into latin2 (latin1 will do to here I guess).
The next step is trickier, but I found an article on that years ago
(when loads of 4.4 databases with utf-8 encoded text stored in latin1
tables needed to be converted).
First you tell MySQL that the column type changed from a text type into
the corresponding binary type; VARCHAR -> VARBINARY, TINYTEXT ->
TINYBLOB, etcetera. MySQL does nothing to the data itself in this
situation, but simply changes the column type. The beauty is that binary
data types do not have a character set and collation.
After this you tell MySQL to change the data type into a text type with
utf-8 character set; TINYBLOB -> TINYTEXT, etcetera. MySQL again does
nothing to the data itself but simply marks that it has a character set
and collation. If it reads the data it encounters C48C and interprets
that as a Č.
You can do this manually, but with dozens of tables with a couple or
dozens of text fields it becomes quite a task.
I'm happy that it worked and that you have a usable database.
--
Jigal van Hemert
TYPO3 CMS Active Contributor
TYPO3 .... inspiring people to share!
Get involved: typo3.org
More information about the TYPO3-english
mailing list