[TYPO3-core] RFC #7942: Enable UTF-8 by default

Jigal van Hemert jigal at xs4all.nl
Thu Nov 11 15:44:13 CET 2010


Hi, Michael,

On 11-11-2010 9:30, Michael Stucki wrote:
>> To make the entire backend use UTF-8 there is no real need to have the
>> database, tables or columns use a utf8 charset or compatible collation.
>
> I think it depends on the connection. If the connection is UTF8 then
> this will definitely fail, as soon as you try to store characters which
> are not available in Latin1.

In this case you might lose those characters (as documented by MySQL). 
If that is your definition of "fail", then it fails.

> Having a Latin1 connection then it may work, although it's an ugly
> constellation. The result will be a database which is marked as Latin1
> but has UTF-8 content.

If the "connection" (I assume you mean something like SET NAMES latin1) 
is latin1 then the data you send to MySQL is assumed to be encoded in 
the latin1 charset. I don't think that PHP will convert the strings 
before sending them to the MySQL client, so you will probably end up 
with UTF-8 content which is seen as Latin1 data.

Before you could set the the connection charset this was a very common 
problem.
I have a small PHP script lying around which can convert a whole 
database with UTF-8 content in latin1 (or any other charset) 
tables/column to UTF-8 and change the database/table/column 
charset/collation to utf8/utf_general_ci.

> Try to edit such a table in phpMyAdmin, and you
> will see the problem... Also, converting is not easily possible because
> a conversion of the DB from Latin1 to UTF8 would convert the content, so
> the result would be double-encoded content.

Converting is rather simple when you first convert the columns to a 
binary type which is comparable with the original (VARCHAR -> VARBINARY, 
TEXT -> BLOB, etc.) and then convert them to the original type with the 
utf8 charset defined for that column.
In the first step MySQL will simply change the column type and not 
change anything to the data itself and with the second step MySQL will 
interpret the binary data as UTF-8.
A lot of work if you want to do it manually, easy if a script takes care 
of it. And if the column is already utf-8 the conversion will not change 
anything to the data (only takes time).

-- 
Kind regards / met vriendelijke groet,

Jigal van Hemert
skype:jigal.van.hemert
msn: jigal at xs4all.nl
http://twitter.com/jigalvh


More information about the TYPO3-team-core mailing list