[TYPO3-dev] Suggestions for robust UTF-8 support

Martin Kutschker Martin.Kutschker at n0spam-blackbox.net
Thu Mar 20 14:46:28 CET 2008


David Förster schrieb:

> 
> The main challenge of supporting UTF-8 is, that you can no longer rely on that 
> the length of string is equal to the number of bytes it occupies. By default 
> PHP has no UTF-8 support at all, and it's string functions (like strlen) will 
> return incorrect results for UTF-8 string.

"We" know that and that's why we have added t3lib_cs. Use it's methods to 
work on any string unless you know what you're doing.

> However there are some problems with that very setting and Typo3 at the moment 
> (the two bugs mentioned above). One of it is the export/import feature 
> relying on string-length == byte count. Being aware of UTF-8 you should fix 
> this by introducing a byte_count function and use it in the rare cases where 
> the byte count of a string is needed instead of it's length. (And example for 
> this function is attached to one of the reports.)
> 
> For robust UTF-8 support in Typo3 I suggest:
> 
> - developers, get familar with UTF-8

This is always a fine idea.

> - fixing Typo3 to work with mbstring.func_overload enabled (distinguish 
> between string-length and byte count)

No, see above why.

> I'm willing to help with that and send patches, but the response to the bug 
> reports has been very little so far.

Many user's (incl. me) run TYPO3 successfully with UTF-8.

As for #7869: Don't use the function overload feature of mbstring. TYPO3 
doesn't work with it as it does it's own character handling. The installer 
should probably check this setting and isse a warning.

As for #7882: I don't quite get what went wrong, but it seems that it's 
again the overloading. The only fixed byte length is used for the beginning 
of the data structure. This one is - I assume - ASCII and so it is ok to do so.

Masi




More information about the TYPO3-dev mailing list