[Typo3-dev] Character set inquery

Sacha Vorbeck sachav at gmx.net
Tue Sep 9 07:54:19 CEST 2003


Hi Kasper,

> Does any of you have some valuable insights here?

I asked Björn Höhrmann (http://bjoern.hoehrmann.de/) who is working 
for/with the W3C for a statement. He was so kind to write the following 
message:

<quote>
The detection of the character encoding of text/html resources is
defined in section 5.2.2 of HTML 4.01,

   http://www.w3.org/TR/html4/charset.html#h-5.2.2

Basically

   if (HTTP Content-Type header has a charset parameter)
   {
     use it;
   }
   elsif (document starts with a byte order mark)
   {
     use it; /* i.e., UTF-x depending on the BOM */
   }
   elsif (document has <meta http-equiv=Content-Type with charset)
   {
     use it;
   }
   else
   {
     do some magic; /* probability analysis, user defaults, ... */
   }

Common web browsers implement this to some extend. I am not aware of any
browser where the <meta> element takes precedence over the encoding
specified in the HTTP header. If there is, it is broken and it is likely
that a number of web sites break in this browser.

Note that the XML declaration <?xml version='1.0' encoding='...'?>
Robert mentions is ignored as it is a processing instruction from a HTML
point of view and the W3C HTML WG wants user agents to treat text/html
resources only as tag soup, not as XHTML and thus XHTML/XML rules do not
apply to the document.

I am not sure how Apache, PHP, and Typo3 interact here. As the HTML
Recommendation states, the character encoding should be specified in the
HTTP header and the meta element is only to be used as a last resort if
you cannot configure the web server properly. In PHP you could do

   header('Content-Type: text/html;charset=utf-8');

which should override any default encoding set by the default_charset
config option for PHP or the AddDefaultCharset in the Apache
configuration. If the document relies on the <meta> element so specify
the character encoding of the document and the web server or PHP
configuration specifies a default encoding, you run into the problem
Robert talks about.

If you cannot modify the HTTP header as in the example above, this is a
documentation issue, you cannot do anything about it; the HTTP header
always takes precedence and if the web server is configured to default
to say ISO-8859-1 the document cannot use any other character encoding
without modifying the web server configuration.

regards.
</quote>

-- 
all the best,

Sacha






More information about the TYPO3-dev mailing list