[TYPO3-dev] UTF8 problem when parsing XML data...

Jigal van Hemert jigal.van.hemert at eurorscg.nl
Thu Jul 13 08:44:25 CEST 2006


> > Anyway, he doesn't tell where it is displayed wrong and how he 
> > retrieves and displays the data.
> 
> The data is fetched from an XML file using simple_xml and the 
> php array is simply parsed to $content:
> 
> [...]
> 
> $xml = simplexml_load_file($xmlFile);
> 
>   foreach ($xml->LocationInfo as $item) {
> 
>            $content .= $item->Company;
> 
>   [...]
> 
> 
> As I told before, generally works like charm but with 
> special-chars something wild seems to be happening ;-)

I've waited to respond because I suspected that there would somehow be a
TYPO3 related issue involved.

I think the problem lies in the original XML. The "special-chars" as
mentioned are not encoded using the encoding that was indicated in the
XML-header, but represented by (numerical) entities.
At some moment in the process chain the entities are converted to
characters (can be anywhere between XML transformation and displaying
the content in the browser). It seems that the entities are converted to
UTF-9 encoded characters, but that the output is interpreted as
ISO-xxxx-x (ISO-8859-1?).

I really wonder what happens if in the original XML the entities are
converted to (utf-8 encoded) characters?

Furthermore, check whether each step treats the string as utf-8 (PHP
usually uses ISO-8859-1 as internal encoding and doesn't interprete
strings otherwise unless you use multi-byte safe functions). Check if
the output is really utf-8 and if the document tells the browser that
the content is utf-8 encoded. Check the HTTP-header to see if the
encoding is set correctly, etc.

Regards, Jigal.




More information about the TYPO3-dev mailing list