[TYPO3-core] RFC: problem with php4 and xml data with byte order mark

Michael Stucki michael at typo3.org
Thu Nov 9 12:08:36 CET 2006


Hi Martin,

> Problem:
> PHP4 doesn't like a Unicode byte order mark at the beginning of XML files
> in UTF-8. If a BOM is present the parsed data is no in UTF8 any more. A
> BOM is valid at the beginning of an XML file. Ususally it's added by text
> editors on Windows.
> 
> Solution:
> Look for the BOM and set the charset to UTF8 if a UTF8 BOM is found.

There are some problems with this patch:

- You have renamed $ereg_result to $match, but one line down you didn't
change that accordingly for $theCharset.

- The preg_match does not work anymore. Tested with PHP 5.1

> Test:
> include('./class.t3lib_div.php');^
> $string = "\xEF\xBB\xBF".
>   '<?xml version="1.0" encoding="utf-8" ?><val>Välué</val>';

Hmm, that doesn't look like finished, right?

But alright. I have added some more changes and made a new patch, including
a new test script.

The patch works, I tried it with PHP4 and PHP5.1.

- michael
-- 
Use a newsreader! Check out
http://typo3.org/community/mailing-lists/use-a-news-reader/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t3lib_div-BOM_v2.diff
Type: text/x-diff
Size: 2540 bytes
Desc: not available
Url : http://lists.netfielders.de/pipermail/typo3-team-core/attachments/20061109/916f5b60/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.php
Type: application/x-php
Size: 184 bytes
Desc: not available
Url : http://lists.netfielders.de/pipermail/typo3-team-core/attachments/20061109/916f5b60/attachment-0001.bin 


More information about the TYPO3-team-core mailing list