[TYPO3-english] How can I convert my database to utf8?

Jigal van Hemert jigal at xs4all.nl
Fri Jun 4 22:32:18 CEST 2010


Hi Jörg,

Jörg Klein wrote:
> "Jörg Klein" <joerg at klein-family.com> schrieb im Newsbeitrag 
> news:mailman.1.1275528474.13822.typo3-english at lists.typo3.org...
>>> Furthermore, if you use the following settings in the Install Tool you 
>>> should have a UTF-8 installation:
>>>
>>> $TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
>>> $TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
>> I just tried that and it works! Thank you so much!
> 
> My problem arose from another setting in 
> $TYPO3_CONF_VARS['SYS']['setDBinit'].
> My provider wrote that the following should be set there:
> $TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;\'.chr(10).\
> 'SET CHARACTER SET utf8;\'.chr(10).\
> 'SET SESSION character_set_server=utf8;';

I looked up all those settings a few times before and also posted a 
lengthy explanation on one of the TYPO3 lists a few weeks ago. You can 
look the commands up in the online MySQL manual if you want :-)

Bottom line is that SET NAMES utf8; sets the correct variables for 
charsets and collation (it uses the default collation for utf8: 
utf8_general_ci) to make sure that both the MySQL client (=the functions 
in PHP), the connection and the result set of a query is in utf-8.

>> I just tried to understand what exactly your code does:
> 
> For the conversion your code temporarily changes the type of some columns: 
> char to binary and text to blob.
> I never did that. Why is that needed?

The temporary changes to binary/blob types (also varchar to varbinary, 
mediumtext to mediumblob, etc.) is used if there is utf-8 encoded data 
in for example latin-1 fields.
Changing the type to binary/blob does nothing with the data itself, but 
MySQL will now see the data as binary and does not interpret it as 
having a charset and collation.
Changing the type back to char/text again and setting the charset and 
collation does nothing with the data itself, but MySQL will now treat 
the data as string data with the defined charset and collation.
The total effect is that the already utf-8 encoded data is now seen by 
MySQL as utf-8 data.

If the data is correctly encoded in non-utf-8 columns you can simply 
turn the lines which perform the first query into comments. The script 
will then just set each column to utf-8.

The rest of the script gives some visual feedback (also useful to keep 
the connection to the server open!) and sets the default 
charset/collation for all tables and the database itself.

You can do this 'conversion' by hand, but doing this for dozens of 
tables each with several columns to handle is not something I would look 
forward to.

-- 
Jigal van Hemert
skype:jigal.van.hemert
msn: jigal at xs4all.nl
http://twitter.com/jigalvh


More information about the TYPO3-english mailing list