[Typo3-dev] Some questions, problems and point of views about xml_parser_create

Erik Svendsen erik at linnearad.no
Mon Oct 3 22:50:52 CEST 2005


First of all. It looks like this is a problem only with PHP 5. And it looks 
like the problem has somthing to do with the PHP function xml_parser_create 
(but I don't know enough about TYPO3 and PHP to be exactly sure).

I have the problem two with two different functions (extensions), impexp 
and templavoila. But as far as I know the problem could be a problem everywhere 
XML parsing is used. The problem both places is that characters as å æ ø 
(norwegian), ü ä à etc. make the parser to crash (an error). I have made 
an previous post about the problem news://news.netfielders.de/mailman.1.1128075811.24776.typo3-english@lists.netfielders.de. 
The problem is also described in http://bugs.typo3.org/view.php?id=497

--- Copy of my earlier post ---
If I use non-US characters, ex. å, ø ö ä à etc, in title, mapping instructions 
or sample data (in the Building data Structure page) I got the following 
error message when trying to look at the XML: The input content failed XML 
parsing: Line xx: Invalid character.

The reference to the DS is also broken if I save the DS.

The simple solution is not using any non-US characters.

I don't have the same problem on a installation with PHP 4.4 (TYPO3 3.8.0, 
TV 4.0)

I suppose the problem isn't TYPO3, but the XML parsing in PHP. But I make 
a post, both to give a solution to others who get the same problem, and to 
get some responses from other that have had/has the same problem. And maybe 
some who are more skilled in TYPO3 and PHP than me, can find the real solutions 
(and if necessary, post a bug at PHP.net).

But I also think this is a problem that should be solved. It's more userfriendly 
to have titles in native language.

I make the post both in typo3.english and typo3.dev. Responses should be 
given in typo3.dev.

My configurations:
Server 1: Fedora Core 4 X64, Apache 2.0.54, PHP 5.0.4, MySQL 4.1.12, eaccelerator 

0.9.3.
Server 2: Windows XP SP2, Apache 2.0.54, PHP 5.0.5, MySQL 5.0.12-beta-nt, 
eaccelerator 0.9.3.
Server 3: Debian, Apache 1.3.33, PHP 4.4.0, MySQL 4.1.14 (no problem with 
non-US characters)
--- end of copy ---

You also got a problem if you are using FCE with TV. The editors can't use 
these norwegian characters in plain textfield. And it's a problem when you 
are making Norwegian websites (and a lot of other European languages). The 
solution in bug 497 do not help.

To learn a little more about the problem and the XML parsing i made a small 
parsing (xml_parser_create) script with help of the PHP documentation. What 
I found was that every special character in norwegian, german, swedish gave 
an error if no charset encoding, uttf-8 and utf-16 was used in the xml-file. 
If charset iso-8859-1 was used, the parsing went okay.

What I learn here gave me an idea that my problem with impexp had the same 
reason (using TV). The exported xml-file had the encoding iso-8859-1, and 
went straight trough my parser-script. But I couldnt import it correct. I 
then search and replaced å with aa, ø with oe and æ with ae in the exported 
xml-file. And the import went smooth. I think i maybe only neccessary to 
change the part of the file who is refering to TV. I shall test it. I don't 
know if this problem is impexp or TV related. 

This gave me the idea (probably wrong) that the parserscript in impexp don't 
use the encoding in the xml-file. Somebody who knows? I have the understanding 
that TV is using utf-8. Is there any way to change this?

As I see it, there are workaround for most of these problems, except the 
TCE problem. The easiest workaround is not using these special characters, 
but it's not the best solutions for the users. Another solution is maybe 
to force impexp and TV to use a spesific charset if needed. Someone maybe 
have an solution to this problem. 

It was a long post, but I think this is a problem that should be solved somehow. 



WBR,
Erik Svendsen
www.linnearad.no






More information about the TYPO3-dev mailing list