[TYPO3-dev] Suggestions for robust UTF-8 support
Martin Kutschker
Martin.Kutschker at n0spam-blackbox.net
Thu Mar 20 14:46:28 CET 2008
David Förster schrieb:
>
> The main challenge of supporting UTF-8 is, that you can no longer rely on that
> the length of string is equal to the number of bytes it occupies. By default
> PHP has no UTF-8 support at all, and it's string functions (like strlen) will
> return incorrect results for UTF-8 string.
"We" know that and that's why we have added t3lib_cs. Use it's methods to
work on any string unless you know what you're doing.
> However there are some problems with that very setting and Typo3 at the moment
> (the two bugs mentioned above). One of it is the export/import feature
> relying on string-length == byte count. Being aware of UTF-8 you should fix
> this by introducing a byte_count function and use it in the rare cases where
> the byte count of a string is needed instead of it's length. (And example for
> this function is attached to one of the reports.)
>
> For robust UTF-8 support in Typo3 I suggest:
>
> - developers, get familar with UTF-8
This is always a fine idea.
> - fixing Typo3 to work with mbstring.func_overload enabled (distinguish
> between string-length and byte count)
No, see above why.
> I'm willing to help with that and send patches, but the response to the bug
> reports has been very little so far.
Many user's (incl. me) run TYPO3 successfully with UTF-8.
As for #7869: Don't use the function overload feature of mbstring. TYPO3
doesn't work with it as it does it's own character handling. The installer
should probably check this setting and isse a warning.
As for #7882: I don't quite get what went wrong, but it seems that it's
again the overloading. The only fixed byte length is used for the beginning
of the data structure. This one is - I assume - ASCII and so it is ok to do so.
Masi
More information about the TYPO3-dev
mailing list