[Typo3-dev] How to configure TYPO3 for usage of UTF-8 output

Robert Lemke rl at robertlemke.de
Thu Oct 16 16:42:54 CEST 2003


Hi folks,

do you remember the discussion about some Apache servers overriding the
charset information? I did some investigation on that and came to
working solution which I'd like to explain here.

All information is related to TYPO3 3.6.0-dev.

HOW THE BROWSER DETERMINES THE CHARSET

In order to work with utf-8 content you generally have to do two things:

- Switch on utf-8 encoding for the backend
- Turn on utf-8 support for frontend output

As for the backend there is a new option you can find in the install
tool [forceCharset]. Simply turn it on and all content you enter from
now will be saved utf-8 encoded.

In order to set the frontend encoding correctly you have to understand
the way how the browser determines the character-set to use. The
following locations will be searched for information about the charset
encoding. The first location has the highest priority, i.e. if
information was found, the browser does not look any further.

   - HTTP header information ('Content-type: text/html; charset=utf8')
   - XML charset encoding scheme ('<?xml version="1.0"
encoding="utf-8"?>')
   - META tag information ('<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />')

Now the problem is, that if you don't provide the HTTP header
information (which TYPO3 doesn't do automatically), the information in
the XML tag and the META tag *might* be overriden by Apache.

WHY APACHE OVERRIDES CHARSET ENCODING

So why that?, you might ask. Well, not transmitting *any* charset
encoding information is considered to be a Cross Site Scripting problem.
That's why Apache suggests to add a default charset parameter into the
httpd.conf which sets the charset if no header information is available.
If you're interested in the background, you find more information about
the CSS issue on apache.org [1] and the official website of CERT [2].

HOW TO SET UP TYPO3 

Of course you could disable the parameter in the httpd.conf which adds
that header information and the browser would fall back to the
information provided in the XML and META tag respectively. But that's
not the clean way.

The solution is to just add an additional header via TypoScript, which
contains our desired encoding scheme. The following setup in your
TypoScript template will enable utf-8 output with XHTML support:

   config.doctype = xhtml_trans
   config.metaCharset = utf-8
   config.additionalHeaders = Content-Type:text/html;charset=utf-8
   config.xhtml_cleaning = all

What we might discuss is, if the header should be inserted by TYPO3 by
default, because we already have that information about the charset
(metaCharset). That would be small change to the core which inserts the
header before the additionalHeaders, i.e. you may override it anyways.

Finally, if you'd like to test your settings, just use a HTTP header
viewer [3] and look at the page information provided by your internet
browser.

[1] http://httpd.apache.org/info/css-security/encoding_examples.html
[2] http://www.cert.org/advisories/CA-2000-02.html
[3] http://www.rexswain.com/cgi-bin/httpview.cgi

-- 
robert






More information about the TYPO3-dev mailing list