[Typo3-dev] Some questions, problems and point of views about xml_parser_create
Erik Svendsen
erik at linnearad.no
Mon Oct 3 22:50:52 CEST 2005
First of all. It looks like this is a problem only with PHP 5. And it looks
like the problem has somthing to do with the PHP function xml_parser_create
(but I don't know enough about TYPO3 and PHP to be exactly sure).
I have the problem two with two different functions (extensions), impexp
and templavoila. But as far as I know the problem could be a problem everywhere
XML parsing is used. The problem both places is that characters as å æ ø
(norwegian), ü ä à etc. make the parser to crash (an error). I have made
an previous post about the problem news://news.netfielders.de/mailman.1.1128075811.24776.typo3-english@lists.netfielders.de.
The problem is also described in http://bugs.typo3.org/view.php?id=497
--- Copy of my earlier post ---
If I use non-US characters, ex. å, ø ö ä à etc, in title, mapping instructions
or sample data (in the Building data Structure page) I got the following
error message when trying to look at the XML: The input content failed XML
parsing: Line xx: Invalid character.
The reference to the DS is also broken if I save the DS.
The simple solution is not using any non-US characters.
I don't have the same problem on a installation with PHP 4.4 (TYPO3 3.8.0,
TV 4.0)
I suppose the problem isn't TYPO3, but the XML parsing in PHP. But I make
a post, both to give a solution to others who get the same problem, and to
get some responses from other that have had/has the same problem. And maybe
some who are more skilled in TYPO3 and PHP than me, can find the real solutions
(and if necessary, post a bug at PHP.net).
But I also think this is a problem that should be solved. It's more userfriendly
to have titles in native language.
I make the post both in typo3.english and typo3.dev. Responses should be
given in typo3.dev.
My configurations:
Server 1: Fedora Core 4 X64, Apache 2.0.54, PHP 5.0.4, MySQL 4.1.12, eaccelerator
0.9.3.
Server 2: Windows XP SP2, Apache 2.0.54, PHP 5.0.5, MySQL 5.0.12-beta-nt,
eaccelerator 0.9.3.
Server 3: Debian, Apache 1.3.33, PHP 4.4.0, MySQL 4.1.14 (no problem with
non-US characters)
--- end of copy ---
You also got a problem if you are using FCE with TV. The editors can't use
these norwegian characters in plain textfield. And it's a problem when you
are making Norwegian websites (and a lot of other European languages). The
solution in bug 497 do not help.
To learn a little more about the problem and the XML parsing i made a small
parsing (xml_parser_create) script with help of the PHP documentation. What
I found was that every special character in norwegian, german, swedish gave
an error if no charset encoding, uttf-8 and utf-16 was used in the xml-file.
If charset iso-8859-1 was used, the parsing went okay.
What I learn here gave me an idea that my problem with impexp had the same
reason (using TV). The exported xml-file had the encoding iso-8859-1, and
went straight trough my parser-script. But I couldnt import it correct. I
then search and replaced å with aa, ø with oe and æ with ae in the exported
xml-file. And the import went smooth. I think i maybe only neccessary to
change the part of the file who is refering to TV. I shall test it. I don't
know if this problem is impexp or TV related.
This gave me the idea (probably wrong) that the parserscript in impexp don't
use the encoding in the xml-file. Somebody who knows? I have the understanding
that TV is using utf-8. Is there any way to change this?
As I see it, there are workaround for most of these problems, except the
TCE problem. The easiest workaround is not using these special characters,
but it's not the best solutions for the users. Another solution is maybe
to force impexp and TV to use a spesific charset if needed. Someone maybe
have an solution to this problem.
It was a long post, but I think this is a problem that should be solved somehow.
WBR,
Erik Svendsen
www.linnearad.no
More information about the TYPO3-dev
mailing list