[TYPO3-core] RFC #7942: Enable UTF-8 by default
Jigal van Hemert
jigal at xs4all.nl
Fri Nov 12 20:19:26 CET 2010
Hi Michael,
On 11-11-2010 23:01, Michael Stucki wrote:
>> Converting is rather simple when you first convert the columns to a
>> binary type which is comparable with the original (VARCHAR -> VARBINARY,
>> TEXT -> BLOB, etc.) and then convert them to the original type with the
>> utf8 charset defined for that column.
>
> Right, I can confirm this works (used it for a while more or less
> without problems). However, it's clearly a hack and depends on the
> Install Tool for converting fields back to the right types (VARBINARY =>
> VARCHAR, etc.)
Yes, it works. The script has rescued the content of many a site in the
past.
I don't consider it "hacky" as it uses clearly documented features of
MySQL. If UTF-8 encoded data is stored in -- let's say -- latin1 columns
we need to make MySQL believe that the stored bytes are in fact to be
interpreted as UTF-8 instead of Latin1.
The two queries do exactly that; no hacks, undocumented features, dirty
side effects, just plain valid SQL.
The script I use simply reads the field type from the database and uses
that information to decide which 'binary' field it must use for the
conversion.
> I'm now in favour of creating a simple mysqldump and replacing the
> CHARACTER SET statement from table definitions.
I think that this will complicate things quite a bit. It's easy to mess
things up with creating a dump and editing the file is way more 'hacky'
than a few plain queries.
The only problem in both cases is that it's hard to be sure that records
actually contain incorrectly encoded data. Can we only expect UTF-8 data
in other columns, or do we have to take other multi-byte fields into
account?
--
Kind regards / met vriendelijke groet,
Jigal van Hemert
skype:jigal.van.hemert
msn: jigal at xs4all.nl
http://twitter.com/jigalvh
More information about the TYPO3-team-core
mailing list