[TYPO3-core] RFC #7942: Enable UTF-8 by default

Michael Stucki michael at typo3.org
Thu Nov 11 09:20:31 CET 2010


Hi Martin,

>>> So I would recommend a check of the default charset of the database
>>> and a check of all table columns for any existing tables within the
>>> DB.
>>
>> Not necessary at all.
> 
> Care to elaborate?
> 
> My fear is that someone tries to install TYPO3 on a DB that contains (empty) TYPO3 tables (eg by
> using a schema dump to create it by hand). In this case you could end up with columns that are not utf8.

This is true and that's why the patch we're discussing here will only
affect new sites. Users who upgrade can run some consistency check in
the update wizard, and if this tells that the setup is ok for being
converted, they are invited to convert. However, if not, they can set
setDBinit to "" and everything will run like before (though still
broken, probably).

Conclusion: Yes we need those database default and table charset checks.

>>> Additionally I suggest to set the collation "utf8_general_ci"
>>> explicitly for a new DB. I'm not sure if you can change the default
>>> collation for a charset, but why risk anything.
>>
>> No, default collations are defined in MySQL. Maybe you can change them by recompiling MySQL, but
>> that is not a normal situation.
> 
> Using COLLATE instead of CHARSET in the CREATE statement is just a few bytes more.

Why do you care about the collation? If someone sets the charset to
UTF-8 then the collation is automatically "utf8_general_ci":

mysql> SHOW CHARACTER SET LIKE 'utf8';
+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci   |      3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)

So if someone has a different collation than the above, then I assume he
knows what he's doing.

Vice versa, if someone sets the collation alone, then the charset can be
one out of many:

mysql> SHOW COLLATION LIKE '%utf8%';
...
21 rows in set (0.00 sec)

So I do think that the collation is less precise for what we need to
know. Setting the character set alone should be fine.

- michael
-- 
Use a newsreader! Check out
http://typo3.org/community/mailing-lists/use-a-news-reader/


More information about the TYPO3-team-core mailing list