[TYPO3-core] RFC: problem with php4 and xml data with byte order mark

Martin Kutschker martin.kutschker-n0spam at no5pam-blackbox.net
Thu Nov 16 22:56:52 CET 2006


Michael Stucki schrieb:
> Hi Martin,
> 
>> Problem:
>> PHP4 doesn't like a Unicode byte order mark at the beginning of XML files
>> in UTF-8. If a BOM is present the parsed data is no in UTF8 any more. A
>> BOM is valid at the beginning of an XML file. Ususally it's added by text
>> editors on Windows.
>>
>> Solution:
>> Look for the BOM and set the charset to UTF8 if a UTF8 BOM is found.
> 
> There are some problems with this patch:
> 
> - You have renamed $ereg_result to $match, but one line down you didn't
> change that accordingly for $theCharset.
> 
> - The preg_match does not work anymore. Tested with PHP 5.1

Why? I haven't changed it and you have changed 
preg_match('/^[[:space:]]* to preg_match('/^(|.{3}) The old version does 
not work with a BOM, that's why the BOM check comes first. As a BOM for 
UTF-16 has four and not three bytes your version (look for 0 or 3 bytes) 
makes not much sense.

Masi



More information about the TYPO3-team-core mailing list