[TYPO3-content-rendering] Illegal SGML characters in output
Martin Kutschker
Martin.Kutschker at n0spam-blackbox.net
Fri Dec 16 13:19:53 CET 2005
Ernesto Baschny [cron IT] schrieb:
> Martin Kutschker schrieb am 16.12.2005 09:24:
>
>
>>>So outputting these characters for the Web in "charset=iso-8859-1" mode
>>>is not "valid", because they are not part of this charset (which is also
>>>why the W3C-validator chokes on them). The very good article in [2]
>>>present some alternatives on how to output them in a generic way.
>
>
>>Why not simply use windows-1252?
>>
>>This character set is accepted by most if not all browsers on all
>>platforms.
>
> So you suggest changing the default character set for west-european
> languages from iso-latin-1 to windows-1252? That would solve the problem
> for default-setups, of course.
>
> But anyway, if I want my site to be iso-latin-1 encoded, I have to make
> sure no unproper characters are being displayed.
I see.
> Why shouldn't TYPO3 do
> that for me, as the problem is known and a solution is known? I have to
> tell the authors not to use the EURO and other characters? Or to install
> some extention to strip invalid characters from the chosen charset?
>
> I think that as long as TYPO3 supports iso-latin-1, it needs to handle
> the "buggy" input from browsers (which will happily send the invalid
> characters in FORMs) by converting them into something that is valid in
> the chosen character set.
>
> So maybe the solution would be to have this check and conversion done at
> FORM's submission instead, which is the origin of the problem?
Conversion on form submission won't work, as SGML entities are not a
valid content for eg title fields. They won't get through
htmlspecialchars().
But can this be a solution? A special version of htmlspecialchars that
deals with the fact.
Masi
More information about the TYPO3-project-content-rendering
mailing list