[TYPO3-content-rendering] Illegal SGML characters in output

Martin Kutschker Martin.Kutschker at n0spam-blackbox.net
Fri Dec 16 13:19:53 CET 2005


Ernesto Baschny [cron IT] schrieb:
> Martin Kutschker schrieb am 16.12.2005 09:24:
> 
> 
>>>So outputting these characters for the Web in "charset=iso-8859-1" mode
>>>is not "valid", because they are not part of this charset (which is also
>>>why the W3C-validator chokes on them). The very good article in [2]
>>>present some alternatives on how to output them in a generic way.
> 
> 
>>Why not simply use windows-1252?
>>
>>This character set is accepted by most if not all browsers on all
>>platforms.
> 
> So you suggest changing the default character set for west-european
> languages from iso-latin-1 to windows-1252? That would solve the problem
> for default-setups, of course.
> 
> But anyway, if I want my site to be iso-latin-1 encoded, I have to make
> sure no unproper characters are being displayed.

I see.

 > Why shouldn't TYPO3 do
> that for me, as the problem is known and a solution is known? I have to
> tell the authors not to use the EURO and other characters? Or to install
> some extention to strip invalid characters from the chosen charset?
> 
> I think that as long as TYPO3 supports iso-latin-1, it needs to handle
> the "buggy" input from browsers (which will happily send the invalid
> characters in FORMs) by converting them into something that is valid in
> the chosen character set.
> 
> So maybe the solution would be to have this check and conversion done at
> FORM's submission instead, which is the origin of the problem?

Conversion on form submission won't work, as SGML entities are not a 
valid content for eg title fields. They won't get through 
htmlspecialchars().

But can this be a solution? A special version of htmlspecialchars that 
deals with the fact.

Masi



More information about the TYPO3-project-content-rendering mailing list