[TYPO3-core] RFC: problem with php4 and xml data with byte order mark

Martin Kutschker Martin.Kutschker at n0spam-blackbox.net
Tue Nov 7 16:51:26 CET 2006


Hi!

This is a SVN patch request.

Problem:
PHP4 doesn't like a Unicode byte order mark at the beginning of XML files 
in UTF-8. If a BOM is present the parsed data is no in UTF8 any more. A BOM 
is valid at the beginning of an XML file. Ususally it's added by text 
editors on Windows.

Solution:
Look for the BOM and set the charset to UTF8 if a UTF8 BOM is found.

Note:
Addionally I have changed the comment concerning PHP's behaviour when 
parsing xml data and replaced ereg with preg_match.

Test:
include('./class.t3lib_div.php');^
$string = "\xEF\xBB\xBF".
  '<?xml version="1.0" encoding="utf-8" ?><val>Välué</val>';

Branches: TYPO3_4-0 and Trunk

Masi
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: t3lib_div-BOM.diff
Url: http://lists.netfielders.de/pipermail/typo3-team-core/attachments/20061107/a7f24556/attachment.diff 


More information about the TYPO3-team-core mailing list