[TYPO3] typo3 database utf-8 conversion

ries van Twisk typo3 at rvt.dds.nl
Wed Jan 16 03:39:44 CET 2008


On Jan 15, 2008, at 8:01 PM, Andreas Becker wrote:

> Hi Steffen and Ries
>
> it does not convert all types of content - If this would be fixed it  
> would
> be very useful.
> With convert2utf8 you get your tt_content converted but i.e. not  
> tt_news - i
> guess the same serialize problem.
>
> I guess the only suitable solution might be to convert your database  
> in to
> binaries before performing the conversion with iconv.
> Then editing the charset and collation settings in an text editor  
> (depending
> on how big this is pspad is a wonderful tool doing this)
> Then restoring everything back into mysql.
>
> When you convert your data into binaries before doing the conversion  
> then
> all types of arrays should work.
>
> @ Steffen
> What about mysqldumper.
> It offers the export to utf8 - does it do any conversion?
> Won't it be possible to dump a file directly to utf8 and then you  
> create a
> new database - setting charsets and collations to utf8 and restoring  
> your
> dumped utf8 file?
> What actually is mysqldumper doing when it offers to export to utf8?
>
> mysqldumper would be also a very useful tool to perform such a  
> conversion,
> as it offers to store and upload BIG datafiles without timeout  
> problems.
> Is there a chance to implement this utf8 conversion there if it isn't
> already existing?
>
> Andi
> _______________________________________________
>

HEY ANDI,
thank you for your lengthy response.

to get it straight(er)?

1) some older version of mysqldump dumps in the charset the database  
was created in
2) newer versions ALWAYS dump in utf-8 by default.

so when you want to change your latin-x database to urf-8 with a newer  
version of mysqldump,
you can simply dump and the latin-1 => utf-8 conversion will be done  
for you.
However the charset you will see in the dump is still latin-x, so  
during import of the tables and data
mysql will reconv back to latin-x from utf-8.

So if you want to leave it all in utf-8 then all you need to do is  
change in the dumped sql files
the latin-x to utf-8 and then re-import. Usually you can leave the  
collation alone /collation controls
thins like sorting, not how data is stored).

Su theoretically the only correct way to convert a latin-x database to  
utf-8 is do it programatically,
that is you simply load each record and each field of each table.  
Check if it's a serialized array.
If so, de-serialize it and then re-load it (using a other DB  
connection) into a new utf-8 database.
I have done something like that to migrate a mysql database to  
postgresql and it works perfectly.

During the conversion from latin-x to utf-8 you have a change that  
single byte characters get's converted
to 2 or more byte characters and that the length indicator of a  
serialized object is wrong.

clear enough? or is there still doubth? anyone? and correct me if I am  
wrong!

Ries









More information about the TYPO3-english mailing list