[TYPO3-dev] Suggestions for robust UTF-8 support

David Förster david.foerster at andrena.de
Thu Mar 20 16:16:04 CET 2008


Hi Martin,

thanks for your response. I didn't realize how much effort you've already put 
into UTF-8 support. (Personally I find it strange to reimplement the mb 
extension in PHP, but if it's working for Typo...)

I added a note to the UTF8-Support wiki article documenting the func_overload 
setting.

David

Am Donnerstag, 20. März 2008 schrieb Martin Kutschker:
> David Förster schrieb:
> > The main challenge of supporting UTF-8 is, that you can no longer rely on
> > that the length of string is equal to the number of bytes it occupies. By
> > default PHP has no UTF-8 support at all, and it's string functions (like
> > strlen) will return incorrect results for UTF-8 string.
>
> "We" know that and that's why we have added t3lib_cs. Use it's methods to
> work on any string unless you know what you're doing.
>
> > However there are some problems with that very setting and Typo3 at the
> > moment (the two bugs mentioned above). One of it is the export/import
> > feature relying on string-length == byte count. Being aware of UTF-8 you
> > should fix this by introducing a byte_count function and use it in the
> > rare cases where the byte count of a string is needed instead of it's
> > length. (And example for this function is attached to one of the
> > reports.)
> >
> > For robust UTF-8 support in Typo3 I suggest:
> >
> > - developers, get familar with UTF-8
>
> This is always a fine idea.
>
> > - fixing Typo3 to work with mbstring.func_overload enabled (distinguish
> > between string-length and byte count)
>
> No, see above why.
>
> > I'm willing to help with that and send patches, but the response to the
> > bug reports has been very little so far.
>
> Many user's (incl. me) run TYPO3 successfully with UTF-8.
>
> As for #7869: Don't use the function overload feature of mbstring. TYPO3
> doesn't work with it as it does it's own character handling. The installer
> should probably check this setting and isse a warning.
>
> As for #7882: I don't quite get what went wrong, but it seems that it's
> again the overloading. The only fixed byte length is used for the beginning
> of the data structure. This one is - I assume - ASCII and so it is ok to do
> so.
>
> Masi
> _______________________________________________
> TYPO3-dev mailing list
> TYPO3-dev at lists.netfielders.de
> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-dev




More information about the TYPO3-dev mailing list