[TYPO3-core] RFC: Bug #5088: cache is not saved properly because of charset conflict in the database

Martin Kutschker martin.kutschker-n0spam at no5pam-blackbox.net
Mon Apr 2 11:29:09 CEST 2007


Michael Stucki schrieb:
> Hi Martin,
> 
> I was waiting for more comments about this topic, but unfortunately nobody
> seems to be interested. To me it is an important bug, so I want to have
> this problem solved before 4.1.1.
> 
>> But that's a stupid setup. Mysql (and other DBs) will translate the
>> content from the server charset to the client charset. That's fine and
>> good and will work for sane setups.
>>
>> The described setup sends latin1 data as utf8 into the DB which stores
>> them as utf8 (at least for cache_hash.content). This is nonsense.
> 
> If you are not using forceCharset=utf8 and you run the BE in either English
> or German, then the backend will be encoded with latin1 because this is
> (still) their default setting.

And if you choose Russian it will be something else. I'm aware that a 
multi-charset installation is a problem as the http connection will use 
different charsets, but the db connection will always use the same (set 
per my.cnf or SET NAMES).

In this case the data of cache_hash will indeed be in different 
charsets. So I seem, that in fact this data is binary as it is not 
stored plainly but as a serialized array.

>> What I see here is that you must follow one rule: if you change the
>> charset somewhere - clear ALL caches! And read the docs before you fiddle
>> with SET NAMES.
> 
> What users will find in the first place is probably this:
> http://wiki.typo3.org/index.php/UTF-8_support

This page describes a correct setup: UTF-8 all over the place.

>> But I agree that we should take more care when decide to store data as
>> TEXT (charset dependent) or as BLOB (charset independent). I have to admit
>> I was part of the lets-get-rid-of-the-BLOB crowed but I think we should
>> think more before simply revert those changes.
> 
> I was also supporting this idea - but I was also not aware of this problem
> in this - pretty special - environment. Anyway, since there is no charset
> handling in sys_template, we should fix this quickly:
> 
> 1) Change the field back to mediumblob
> 2) Add a migration wizard (not only for sys_template)
> 
>> The case above is no reason for me to change anything, because the setup
>> is broken. Of course we need to check what the impacts are for correct
>> setups.
> 
> I don't think that the setup is broken, but please ask here for details:
> http://bugs.typo3.org/view.php?id=5088

You said they use latin1 for the http connection, but utf8 for the db 
connection. They may store the data in utf8, but the db connection must 
be in latin1. In this case Mysql will transparently convert the data.

But after reading the posts again I don't know what happens. The reports 
vary (some say even they have problem with latin1 only running TYPO3 
4.0?!?).

Anyway, I come to the conclusion that a serialized PHP array is to been 
seen as binary data and therefore to be stored in a BLOB column.

So not only cache_hash.content but also fields like 
cache_pages.cache_data and cache_pagesection.content or session data 
fields may be affected in some way.

Masi


More information about the TYPO3-team-core mailing list