[TYPO3-core] RFC #7942: Enable UTF-8 by default

Michael Stucki michael at typo3.org
Thu Nov 18 01:22:43 CET 2010


Thanks for the quick reviews! (well, in case you wonder: we tested it
together on Skype before)

- Committed to Trunk (r9481)

Greetings, Michael

Am 18.11.2010 01:02, schrieb Michael Stucki:
> Hey nighthawks!
> 
> after working through the whole patch forth and back together with
> Christian and Susanne, I'm reconsidering the whole meaning of it. Let me
> say, the old patch went too far, as it did more than the initial goal
> (setting UTF-8 by default) by also implementing a whole conversion logic.
> 
> Actually, this is not what the patch should have done! It should have
> two simple goals:
> 
> 1. Introduce new default values for forceCharset and setDBinit
> 2. All sites who update from <4.5 should set the former default values
> in localconf.php (so the system behaves like before).
> 
> Attached is a new patch which drops most of the stuff in the old patches
> (which will still come, as mentioned here:
> http://forge.typo3.org/projects/typo3v45-projects/wiki/Feature_Freeze).
> 
> Greetings, Michael
> 
> Am 10.11.2010 11:27, schrieb Benjamin Mack:
>> Hey,
>>
>> this is a SVN patch request.
>>
>> Type: Feature
>>
>> Branch: trunk only
>>
>> BT reference: http://bugs.typo3.org/view.php?id=7942
>>
>> Problem:
>> UTF-8 needs to be enabled by default.
>>
>> Solution:
>> What needs to be done in order to have TYPO3 be completely unicode:
>>
>>  - TYPO3 needs to talk UTF-8 all through the core
>>  - The connection to the database needs to be utf-8
>>
>> Note: It doesn't matter if the DB is UTF-8 or not, because the database
>> only needs to know in which format the data is going to be sent from and
>> to TYPO3 (that is: the connection info). However, we encourage people to
>> make their DB utf-8 by default.
>>
>> 1) We're just talking about the TYPO3 Backend for now, because that's
>> where you usually put data in the database. If a backend user is
>> choosing his language for the backend, TYPO3 takes a character set that
>> it has defined t3lib_cs->charSetArray that fits to the language. so by
>> default english or danish is using "iso-8859-1", russian is using
>> "windows-1251". So far so good. The whole backend is rendered that way
>> and TYPO3 is also using the chosen character set in order to save it to
>> the database. This is getting a real mess if you have a backend user
>> that speaks "english" and another that speaks russian, because then
>> there are datasets with different character sets in the DB!!! Anyway,
>> the famous [UTF-8][forceCharset] tells TYPO3 to always use "utf-8" (or
>> something else) and not use t3lib_cs->charsetArray for that. This means:
>> forceCharset allows TYPO3 to speak one charset regardless of what
>> language a BE user has set.
>>
>> 2) The UTF-8 connection is determined through the database. In MySQL
>> this can be set in the server connection (character_set_connection), but
>> can also be overriden by sending "SET NAMES utf8" with every connection
>> establishment.
>>
>> Imagine some evil setups:
>>
>> - No forceCharset is set, so multiple users with different languages
>> (that have different charsets in t3lib_cs->charsetArray) read and write
>> datasets, even the same datasets. This is chaos.
>>
>> - forceCharset is set, so TYPO3 always reads and writes data in utf-8,
>> which is cool. However, if the DB connection is not set, or the DB
>> server is configured so the connection is "latin1" by default, DB thinks
>> the UTF-8 data that TYPO3 sends is "latin1", and then re-converts it to
>> UTF-8 (if the DB is utf-8), or just stores the data as it is in the DB.
>> This actually works and is no problem, AS LONG AS you don't change the
>> DB connection to UTF-8, which would result in a mixed setup within the
>> DB once you read and write again. Here you need a manual upgrade of your
>> DB, some infos can be found in BT issue #8227
>> (http://bugs.typo3.org/view.php?id=8227)
>>
>> These are cases where the TYPO3 installation is messed up big time, and
>> require a lot of work to change.
>>
>>
>> Advantages by having UTF-8 by default:
>>
>>  * If your FE speaks UTF-8 by default as well, no charset conversion is
>> needed anymore, which will speed up the whole rendering process.
>>  * Having everything with UTF-8 allows a better transition to v5 (don't
>> know how this will look like, but we know UTF-8 is better than any mixed
>> setups :))
>>
>> So. The attached patch does this:
>>
>> Deprecation of any other character set than UTF-8. For two versions the
>> installation can run in other setup, but in 4.7, the option
>> "forceCharset" will go, because it should always be utf-8 anyways.
>> Additionally "multiplyDBfieldSize" should have been deprecated for a
>> long time.
>>
>> A) config_default.php
>> First, the two important parameters "forceCharset" and "setDBinit" are
>> set to "-1", because we need to find out if the parameter was changed in
>> localconf.php or if the installation still uses the original default
>> setting. So, if the options are still "-1" after the inclusion of
>> localconf.php, the installation uses the default setup and has not
>> modified anything. It is checked if the site has been upgraded already
>> (compat_version) 4.5. If the site has been upgraded to 4.5 through the
>> upgrade wizard, the user is on his own.
>>
>> The whole code in config_default.php could be dropped again in 4.8 when
>> migration is done for all installation (dunno yet).
>>
>> B) Helper function in t3lib_db.php to determine if the current
>> connection is UTF-8. This is useful because this can happen through the
>> server configuration or overriden via setDBinit.
>>
>> C) When installing TYPO3 through the 1-2-3 installer create the new
>> database with UTF-8 by default.
>>
>> D) Small change in the update wizard code in order to allow some
>> displaying information without having to show the "next" button all the
>> time. Helpful to let people know what their setup is.
>>
>> E) Upgrade wizard, that shows the information about the current
>> information and a link for a tutorial that explains complex scenarios
>> and how people could upgrade their Backend + DB to UTF-8. We discourage
>> people to have an automated way for doing this.
>>
>> TYPO3 thinks the site has been completely upgraded if:
>>  - forceCharset has been unset in your localconf.php
>>  - AND compat_version is set to 4.5
>>
>> Thanks to Michael Stucki for getting this on the way and explaining
>> everything. Thanks to Tolleiv Nietsch for testing the patch.
>>
>>
>> All the best,
>> Benni.
> 
> 


-- 
Use a newsreader! Check out
http://typo3.org/community/mailing-lists/use-a-news-reader/


More information about the TYPO3-team-core mailing list