[TYPO3-v4] Moving from UTF-8 as default to UTF-8 only in 4.6?

Helmut Hummel helmut.hummel at typo3.org
Sat Mar 19 18:09:05 CET 2011


Hi,

On 19.03.11 09:38, Jigal van Hemert wrote:

> On 18-3-2011 20:06, Helmut Hummel wrote:
>> I'd like to suggest to drop all support for other character encodings
>> besides UTF-8 in TYPO3 4.6, at least for internal processing.
> 
> What's "internal processing"? (See also my remark below)

What I mean with that is, that TYPO3 should handle everything in UTF-8.
So when the code has to deal with strings it can rely on the fact, that
it is always UTF-8.

>> For backend rendering, data storage in the database and frontend
>> rendering there's no point in using other encodings.
> 
> Frontend rendering should still be possible in other encodings. TYPO3 
> output may be used for different things, not only output directly to a 
> browser. I know of at least one organisation which uses the output for 
> further processing in a system which collects the website contents from 
> various sources.

OK. Valid point.

> Output in various formats is possible and for further processing it may 
> be required to be in other encodings that utf-8.

I would argue to always convert it to UTF-8 before doing any further
processing.

>> The only usecase for other encodings could be sending mails,
> 
> Why? If HTML output can be utf-8, why would mail content be different? 
> Or the other way around: if mail content can be in other encodings, why 
> should HTML / XML / whatever output be utf-8-only?

Well, I assumed that all browsers now could handle UTF-8 encodings
properly, but email clients probably not. But I agree that we would
still need output- (and then of course input-) encoding.

So what then would not be supported any more is e.g. pure latin1
configurations, where the presistence layer (database), the files (e.g.
language files) and the output of the backend would be always UTF-8.

> I also agree with Xavier that utf-8 support for various databases should 
> be investigated.

In general I agree. I hope my assumtion, that all serious DBMS have
native UTF-8 support is true...

uh... MSSQL uses UCS-2:
http://support.microsoft.com/kb/232580/en-us

Well the option then would be to encode/ decode it while storing/
fetching data or use the driver of Microsoft if it works, since it is
advertised to tranparently convert form and to UTF-8:
http://www.php.net/manual/de/book.mssql.php#93003

Kind regards,
Helmut

-- 
Helmut Hummel
TYPO3 Security Team Leader

TYPO3 .... inspiring people to share!
Get involved: typo3.org


More information about the TYPO3-project-v4 mailing list