[TYPO3-english] TYPO3 and character encoding problems

Pero Peric pperic at mail.com
Mon Jul 7 21:43:41 CEST 2014


On 7.7.2014. 16:44, bernd wilke wrote:
> Am 07.07.14 16:26, schrieb Pero Peric:
>> I tried many things but nothing works. In this conversion scripts it is
>> always assumed that tables/columns are in Latin1 but all my table
>> definitions are UTF8. For example this is how dump of pages table looks
>> like:
>>
>> CREATE TABLE `pages` (
>>    `uid` int(11) NOT NULL AUTO_INCREMENT,
>>    `pid` int(11) NOT NULL DEFAULT '0',
>>    `t3ver_oid` int(11) NOT NULL DEFAULT '0',
>>     ....
>>    PRIMARY KEY (`uid`),
>>    KEY `t3ver_oid` (`t3ver_oid`,`t3ver_wsid`),
>>    KEY `parent` (`pid`,`sorting`,`deleted`,`hidden`),
>>    KEY `alias` (`alias`)
>> ) ENGINE=MyISAM AUTO_INCREMENT=1544 DEFAULT CHARSET=utf8;
>> /*!40101 SET character_set_client = @saved_cs_client */;
>>
>>
>> i don't know anymore what to do. Btw. one more interesting thing. If i
>> do export via TYPO3 4.4.0 and import into TYPO3 4.7.19 all is fine, so
>> TYPO3 in fact knows how to properly dump and import data! Maybe i should
>> check import/export TYPO3 code :-) Problem is that my site is too big
>> for import/export.
>
> so you need a mysql-dump with following insert.
>
>
>
> charset encodings exist on multiple levels:
> - connection to DB
> - DB
> - table
> - field
> the most important is the field definition for all fields with type
> varchar or text.
>
> maybe you have utf-8 fields and your data is stored in latin*?

Well, it looks to me like i have UTF-8 data but special/non standard 
chars are stored wrong. For example it's like you have a character A 
stored as character B. Now, when i have forceCharset set to UTF-8 in 
4.4.0 this B is somehow magically transformed and printed on a screen as 
a character A !! If i set forceCharset to blank i get garbage.

> again I mention my conversion-script [1] where you may insert your
> coding (see comments in line 11,12)
>
> [1] http://pi-phi.de/293.html
>
> you also may modify your database in place.
> you can modify charset for each field, but then mysql will convert all
> characters. you may avoid this conversion with the help of type 'blob'
> where no conversion is done. (varchar latin -> blob -> varchar utf-8
> will keep the content unchanged) but as this must be done for each field
> this will take a lot of single SQL conversion statements!

I saw your script. It tries to replace all occurrences of latin1 (or 
whatever encoding) with UTF-8. But in my Mysql dump there is nothing to 
replace because all statements are UTF-8 :-(

What bothers me most is this forceCharset. How the hell it manages to 
convert and display characters correctly?

Regards.



More information about the TYPO3-english mailing list