[TYPO3-dev] RFC: Unicode with preg_replace

Martin Kutschker masi-no at spam-typo3.org
Tue Mar 23 16:38:02 CET 2010


David Bruchmann schrieb:
>
> AFAI remember there exist some classes or functions for
> charset-handling, perhaps they just could be a bit extended, i.e. with
> PREG-Functions, taking care of charsets.

These functions exist, but all you have to do in preg_match is to add the 'u' modifier. Of course we
could add a wrapper in t3lib_cs for this, but it would add an overhead without any benefit.

>> What kind of options do we have:
>> - ignore it. This is a rare and unusual case, specific to custom PCRE
>> library compilation. We just add a new Unicode PCRE requirement to the
>> INSTALL.txt
>> - make a check that 'u' is supported. I am not sure how to make yet and
>> I think it is an unnecessary overhead for 99.99% of installations
> 
> This point I don't understand

Dmitry suggestet to test on each run of the indexer if 'u' is supported and then act accordingly.

> but Masi wrote something about utf-9
> strings. I wouldn't change the results to mark them:
> 1) *if* required (I don't think so) then cols should get an extra row to
> mark the charset.

To my knowlede all rows in the DB are in utf-8 no matter what the original charset of the indexed
document was.

That's why I am a bit surprised that Dmitry wants to add even more conversions (when? why?).

Masi




More information about the TYPO3-dev mailing list