[TYPO3-english] TYPO3 and character encoding problems

bernd wilke t3ng at bernd-wilke.net
Mon Jul 7 11:15:25 CEST 2014


Am 07.07.14 10:50, schrieb Pero Peric:
> Hi,
>
> I tried to upgrade my TYPO3 4.4.0 ver. to 4.7.19 but I ran into
> character encoding problems (in 4.4.0 all is working fine). I would
> really appreciate if someone could explain me what is going on here.
>
> In 4.4.0 i have:
>
> [BE][forceCharset] = utf-8
>
> [SYS][setDBinit] is empty, i don't have SET NAMES UTF-8 here.
>
> DB and all tables are created as UTF-8.
>
> My MySQL character encoding variables look like this:
>
> | character_set_client     | utf8                       |
> | character_set_connection | utf8                       |
> | character_set_database   | latin2                     |
> | character_set_filesystem | binary                     |
> | character_set_results    | utf8                       |
> | character_set_server     | latin2                     |
> | character_set_system     | utf8                       |
> | character_sets_dir       | /usr/share/mysql/charsets/ |
>
> In 4.4.0 i get all characters displayed correctly.
>
> So this is what i have in MySQL DB (i will show example on 1 character).
>
> Character Č is represented as hex. C384 what is UTF-8 character Ä. Now
> what i would really want to know - how the hell something that is UTF-8
> character Ä becomes character Č in 4.4.0? I suppose this has something
> to do with forceCharset directive so does anybody know what in fact this
> directive do?
>
> When i do upgrade to 4.7.19 this character is displayed as Ä (i would
> say properly by it's hex. code) but not properly for me :-) If i create
> page in 4.7.19 called Č it is stored as right hex. UTF-8 code for char Č
> and that is C48C.
>
> So to summerize. In 4.4.0 character Č is stored in DB like hex. C384 and
> displayed correctly on the screen, while in 4.7.19 it is displayed as
> character Ä.
>
> If someone could explain me what is 4.4.0 doing here maybe i could
> convert this properly for 4.7.19. Thank you!
>

let's see the history:
since 4.6 it was neccessary to have a clean database with all coding in 
utf-8
up to 4.5 it was possible to have a latin coding for single fields but 
with setDBinit and forceCharset use utf-8 for real.

the problem is different size of one character may use and you have to 
reserve additional space to have a proper coding of 3 byte utf-8 
characters fitting in string fields.

for cleaning up your database down to the fields there were a lot of 
scripts. (e.g.[1])


as I state on the page (in german):
afterwards it is preferred to use no entries for setDBinit and 
foreceCharset at all. Or just the default values, as empty values would 
break functioning.

and be careful with new fields (by new extensions) if your database/ 
your tables have no utf-8 setting by default as the fields may be 
created in latin!



[1] http://pi-phi.de/293.html

bernd
-- 
http://www.pi-phi.de/cheatsheet.html


More information about the TYPO3-english mailing list