[TYPO3-core] RFC: problem with php4 and xml data with byte order mark
Martin Kutschker
martin.kutschker-n0spam at no5pam-blackbox.net
Thu Nov 16 22:56:52 CET 2006
Michael Stucki schrieb:
> Hi Martin,
>
>> Problem:
>> PHP4 doesn't like a Unicode byte order mark at the beginning of XML files
>> in UTF-8. If a BOM is present the parsed data is no in UTF8 any more. A
>> BOM is valid at the beginning of an XML file. Ususally it's added by text
>> editors on Windows.
>>
>> Solution:
>> Look for the BOM and set the charset to UTF8 if a UTF8 BOM is found.
>
> There are some problems with this patch:
>
> - You have renamed $ereg_result to $match, but one line down you didn't
> change that accordingly for $theCharset.
>
> - The preg_match does not work anymore. Tested with PHP 5.1
Why? I haven't changed it and you have changed
preg_match('/^[[:space:]]* to preg_match('/^(|.{3}) The old version does
not work with a BOM, that's why the BOM check comes first. As a BOM for
UTF-16 has four and not three bytes your version (look for 0 or 3 bytes)
makes not much sense.
Masi
More information about the TYPO3-team-core
mailing list