[TYPO3-english] UTF-8 characters and tags malforming content

Juha Tiensyrja juissi at iki.fi
Wed Apr 8 22:40:27 CEST 2009


Hi!

I've been trying to solve the following problem with a TYPO3
installation for about two months now, and I'm just about to lose my
mind trying to fix it. I have narrowed it down to being some strange
problem with UTF-8 characters in conjunction with HTML tags. Probably
related to this problem are bugs 7869 and 9491, however neither of
those have comments that were of help.

The problem is this: on a page with non-ASCII characters (such as ä
and ö and some accents), if the characters are somehow, but not
necessarily immediately, followed by a tag (<b>, <link 60>, etc.),
then the tag is transformed to something like &lt;link 60. This
happens quite often, but not every time - it is more or less random,
but everytime I check I can find at least a few malformed pages.
Sometimes even hundreds (including cached tt_news records). Clearing
the cache can help, but not always. This happens only with content
elements, the site menu or headers seem to be unaffected.

A real example looks something like this.
Malformed: Milja Seppälä (&lt;link
koposihteeri at XXX.fi&gt;koposihteeri at XXX.fi</link>)
Should be: Milja Seppälä (<a
href="mailto:koposihteeri at XXX.fi">koposihteeri at XXX.fi</a>)

The TYPO3 installation in question was updated from version 3.8.1 to
4.2.5 and later on to 4.2.6. During the update I also converted the
database from latin1 to UTF-8 by following the instructions in the
wiki, and in all other ways the site works perfectly. My localconf.php
has the following lines:

$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
$TYPO3_CONF_VARS['SYS']['UTF8filesystem'] = '1';

In the template setup I have the following lines (and around 275 more,
but these I think could be relevant):
config.locale_all = fi_FI.utf8
config.language = fi
config.metaCharset = utf-8
config.renderCharset = utf-8

I have also set UTF-8 to be the default in MySQL configurations and in
php.ini, but they don't seem to have any effect on the situation. I've
made sure that the usual culprits such as mbstring overloading are not
set. The content in the tt_content table seems to be correct (I just
dumped the DB and checked it by hand), but it is possible to find the
malformed pages in the cache.

This is all quite puzzling. Has anyone had the same problem? What did
you do to resolve it? I would really appreciate any help on the
subject.

Sincerely,
--
Juha Tiensyrjä
044 550 0133


More information about the TYPO3-english mailing list