[TYPO3-content-rendering] Illegal SGML characters in output

Martin Kutschker Martin.Kutschker at n0spam-blackbox.net
Fri Dec 16 09:24:43 CET 2005


Ernesto Baschny [cron IT] schrieb:
> Hi,
> 
> the "non SGML character number 128" is probably the most annoying
> validation error that TYPO3-sites hit when users from the Windows world
> (especially european-based) copy&paste input some field which will go
> right through to the frontend.
> 
> THE PROBLEM
> ---------------
> 
> The origin of the problem comes from the fact that the ISO-Latin-1
> character table specifies every character from the decimal range 32 up
> to 255, but has a gap in the range from 128 to 159 (see [1]). This range
> is (mis?)used by Microsoft in the so called "Windows-Latin-1" for
> various characters. The most frequently chars are the EURO-sign, the
> emdash ("langer Gedankenstrich", which MS-Word creates automatically if
> you type an hyphen with spaces around it) and opening-double-quotes
> (bottom) (also created by Word in German if you start some quotation).
> 
> So outputting these characters for the Web in "charset=iso-8859-1" mode
> is not "valid", because they are not part of this charset (which is also
> why the W3C-validator chokes on them). The very good article in [2]
> present some alternatives on how to output them in a generic way.

Why not simply use windows-1252?

This character set is accepted by most if not all browsers on all platforms.

> 
> MY GOAL/AIM
> --------------
> 
> I want this translation to happen in TYPO3-core, without needing any
> extention. Our goal has been XHTML-validity, and this is a major issue
> in this commitment. This is not a "xhtml_cleaning" problem, but a
> generic charset problem. We have proven solutions to the problem, we
> just need to see if they are generic enough not to hurt and add them in
> a meaningful way to the core.

I disagree because setting the proper charset solves this issue.

Masi



More information about the TYPO3-project-content-rendering mailing list