[TYPO3-core] RFC #7942: Enable UTF-8 by default

Ernesto Baschny [cron IT] ernst at cron-it.de
Thu Nov 18 01:23:23 CET 2010


Hi,

*the masses cheer* :) Thanks a lot for the hard work!

Cheers,
Ernesto

Michael Stucki schrieb am 18.11.2010 01:22:
> Thanks for the quick reviews! (well, in case you wonder: we tested it
> together on Skype before)
> 
> - Committed to Trunk (r9481)
> 
> Greetings, Michael
> 
> Am 18.11.2010 01:02, schrieb Michael Stucki:
>> Hey nighthawks!
>>
>> after working through the whole patch forth and back together with
>> Christian and Susanne, I'm reconsidering the whole meaning of it. Let me
>> say, the old patch went too far, as it did more than the initial goal
>> (setting UTF-8 by default) by also implementing a whole conversion logic.
>>
>> Actually, this is not what the patch should have done! It should have
>> two simple goals:
>>
>> 1. Introduce new default values for forceCharset and setDBinit
>> 2. All sites who update from <4.5 should set the former default values
>> in localconf.php (so the system behaves like before).
>>
>> Attached is a new patch which drops most of the stuff in the old patches
>> (which will still come, as mentioned here:
>> http://forge.typo3.org/projects/typo3v45-projects/wiki/Feature_Freeze).
>>
>> Greetings, Michael
>>
>> Am 10.11.2010 11:27, schrieb Benjamin Mack:
>>> Hey,
>>>
>>> this is a SVN patch request.
>>>
>>> Type: Feature
>>>
>>> Branch: trunk only
>>>
>>> BT reference: http://bugs.typo3.org/view.php?id=7942
>>>
>>> Problem:
>>> UTF-8 needs to be enabled by default.
>>>
>>> Solution:
>>> What needs to be done in order to have TYPO3 be completely unicode:
>>>
>>>  - TYPO3 needs to talk UTF-8 all through the core
>>>  - The connection to the database needs to be utf-8
>>>
>>> Note: It doesn't matter if the DB is UTF-8 or not, because the database
>>> only needs to know in which format the data is going to be sent from and
>>> to TYPO3 (that is: the connection info). However, we encourage people to
>>> make their DB utf-8 by default.
>>>
>>> 1) We're just talking about the TYPO3 Backend for now, because that's
>>> where you usually put data in the database. If a backend user is
>>> choosing his language for the backend, TYPO3 takes a character set that
>>> it has defined t3lib_cs->charSetArray that fits to the language. so by
>>> default english or danish is using "iso-8859-1", russian is using
>>> "windows-1251". So far so good. The whole backend is rendered that way
>>> and TYPO3 is also using the chosen character set in order to save it to
>>> the database. This is getting a real mess if you have a backend user
>>> that speaks "english" and another that speaks russian, because then
>>> there are datasets with different character sets in the DB!!! Anyway,
>>> the famous [UTF-8][forceCharset] tells TYPO3 to always use "utf-8" (or
>>> something else) and not use t3lib_cs->charsetArray for that. This means:
>>> forceCharset allows TYPO3 to speak one charset regardless of what
>>> language a BE user has set.
>>>
>>> 2) The UTF-8 connection is determined through the database. In MySQL
>>> this can be set in the server connection (character_set_connection), but
>>> can also be overriden by sending "SET NAMES utf8" with every connection
>>> establishment.
>>>
>>> Imagine some evil setups:
>>>
>>> - No forceCharset is set, so multiple users with different languages
>>> (that have different charsets in t3lib_cs->charsetArray) read and write
>>> datasets, even the same datasets. This is chaos.
>>>
>>> - forceCharset is set, so TYPO3 always reads and writes data in utf-8,
>>> which is cool. However, if the DB connection is not set, or the DB
>>> server is configured so the connection is "latin1" by default, DB thinks
>>> the UTF-8 data that TYPO3 sends is "latin1", and then re-converts it to
>>> UTF-8 (if the DB is utf-8), or just stores the data as it is in the DB.
>>> This actually works and is no problem, AS LONG AS you don't change the
>>> DB connection to UTF-8, which would result in a mixed setup within the
>>> DB once you read and write again. Here you need a manual upgrade of your
>>> DB, some infos can be found in BT issue #8227
>>> (http://bugs.typo3.org/view.php?id=8227)
>>>
>>> These are cases where the TYPO3 installation is messed up big time, and
>>> require a lot of work to change.
>>>
>>>
>>> Advantages by having UTF-8 by default:
>>>
>>>  * If your FE speaks UTF-8 by default as well, no charset conversion is
>>> needed anymore, which will speed up the whole rendering process.
>>>  * Having everything with UTF-8 allows a better transition to v5 (don't
>>> know how this will look like, but we know UTF-8 is better than any mixed
>>> setups :))
>>>
>>> So. The attached patch does this:
>>>
>>> Deprecation of any other character set than UTF-8. For two versions the
>>> installation can run in other setup, but in 4.7, the option
>>> "forceCharset" will go, because it should always be utf-8 anyways.
>>> Additionally "multiplyDBfieldSize" should have been deprecated for a
>>> long time.
>>>
>>> A) config_default.php
>>> First, the two important parameters "forceCharset" and "setDBinit" are
>>> set to "-1", because we need to find out if the parameter was changed in
>>> localconf.php or if the installation still uses the original default
>>> setting. So, if the options are still "-1" after the inclusion of
>>> localconf.php, the installation uses the default setup and has not
>>> modified anything. It is checked if the site has been upgraded already
>>> (compat_version) 4.5. If the site has been upgraded to 4.5 through the
>>> upgrade wizard, the user is on his own.
>>>
>>> The whole code in config_default.php could be dropped again in 4.8 when
>>> migration is done for all installation (dunno yet).
>>>
>>> B) Helper function in t3lib_db.php to determine if the current
>>> connection is UTF-8. This is useful because this can happen through the
>>> server configuration or overriden via setDBinit.
>>>
>>> C) When installing TYPO3 through the 1-2-3 installer create the new
>>> database with UTF-8 by default.
>>>
>>> D) Small change in the update wizard code in order to allow some
>>> displaying information without having to show the "next" button all the
>>> time. Helpful to let people know what their setup is.
>>>
>>> E) Upgrade wizard, that shows the information about the current
>>> information and a link for a tutorial that explains complex scenarios
>>> and how people could upgrade their Backend + DB to UTF-8. We discourage
>>> people to have an automated way for doing this.
>>>
>>> TYPO3 thinks the site has been completely upgraded if:
>>>  - forceCharset has been unset in your localconf.php
>>>  - AND compat_version is set to 4.5
>>>
>>> Thanks to Michael Stucki for getting this on the way and explaining
>>> everything. Thanks to Tolleiv Nietsch for testing the patch.
>>>
>>>
>>> All the best,
>>> Benni.
>>
>>
> 
> 



More information about the TYPO3-team-core mailing list