[Typo3-dev] utf-8 standard in TYPO3 4.0?

Dmitry Dulepov typo3 at fm-world.ru
Wed Aug 3 08:36:03 CEST 2005


Hi!

Well, more information on this topic...

=== Case #1 ===
I mentioned MySQL locale with a condition that code exists that search
utf-8 phrase in the non-utf-8 database. This will be important only when
specific conditions are met.

Firsts, query must look like '%abc' (i.e. wildcard is in the beginning
of the phrase). Secondly, remaining characters in the query must use
symbols in the 0x80 - 0xFD range. Thirdly, your search phrase must
constructed in a specific way. Hard to explain but easy to show:
- imagine that search phrase look like '%\x81aaa'
- phrase in the database is 'xyz\xC0\x81aaabbb'
- on non-utf-8-enabled database this phrase will be found (false match!)
- on utf-8-enabled database this should not be found because MySQL
should take prefix into account.

I actually did not have a chance to verify if MySQL correctly handles
such situation.

Fortunately, there is a very little chance that this problem will
appear: too many conditions should be met and chaacters like 0x81 are
unlikely to appear in the search string.

=== Case #2 ===
Another (larger) problem pops up if one uses MySQL string functions
(which is bad because typo3 goes for database abstraction). On
non-utf-8-enabled databases they will not function properly given that
at least one utf-8 character exists in the search string and/or database
field.

=== Case #3 ===
Case-insensitive searches for VARCHARs. Case-insensitive searches
(default in MySQL if you not mark VARCHAR field as BINARY) will not
function correctly if you search for UTF-8 string on non-utf-8 database.
MySQL will not be able to convert UTF-8 characters between upper/lower
case properly. This can make indexed_search extension useless.

This is all that I can imagine so far. One should think how important
these things for Typo3. I believe that only case #3 is really
significant. Case #1 is rather theoretical...

Dmitry.

Sean Ellis wrote:
> Robert Lemke wrote:
> 
> Hi all,
> 
> I know that the topic of the thread refers to TYPO3 4.0, but hopefully I
> won't be considered 'out of line' with a question regarding  TYPO3 3.8.
> And reading through the other posts in the thread is not clarifying
> things much.
> 
>>    -  What steps have to be taken to enable utf-8 by default in TYPO3
>>4.0?
> 
> My question is: What steps have to be taken to enable utf-8 in TYPO3 3.8?
> 
> Dmitry mentions altering database tables to utf8, but I'm getting the
> impression that others are using the default mysql Latin1 settings, and
> merely changing TYPO3 settings such as forceCharset. Is there a
> recommendation about this? A right, or a wrong way?
> 
> I'm specifically thinking of this in the context of asian language
> character sets, but I doubt that I'm alone with these questions.
> 
> cheers,
> 
> Sean




More information about the TYPO3-dev mailing list