[TYPO3-core] RFC: Bug #5088: cache is not saved properly because of charset conflict in the database

Martin Kutschker martin.kutschker-n0spam at no5pam-blackbox.net
Mon Apr 2 17:28:04 CEST 2007


Michael Stucki schrieb:
> Hi Martin,
> 
>> And if you choose Russian it will be something else. I'm aware that a
>> multi-charset installation is a problem as the http connection will use
>> different charsets, but the db connection will always use the same (set
>> per my.cnf or SET NAMES).
> 
> Yes, so in this case the database connection uses UTF-8, but the data which is 
> sent comes from a Latin1 browser. This is no problem when saving this data, 
> because sys_template.config (the "setup" field) is defined as blob.
> 
> Changing this field into "text" without using a migration tool would probably 
> strip some important(?) data from the template, so I think it's not the 
> preferred solution, but should also fix the problem.
> 
>> In this case the data of cache_hash will indeed be in different
>> charsets. So I seem, that in fact this data is binary as it is not
>> stored plainly but as a serialized array.
> 
> sys_template.config is not a serialized array but still is defined as blob. 
> The main difference is, that cache_hash.content is mediumtext, so the content 
> will be converted, no matter if the data is serialized or not.
> 
>> You said they use latin1 for the http connection, but utf8 for the db
>> connection. They may store the data in utf8, but the db connection must
>> be in latin1. In this case Mysql will transparently convert the data.
>>
>> But after reading the posts again I don't know what happens. The reports
>> vary (some say even they have problem with latin1 only running TYPO3
>> 4.0?!?).
> 
> Yes, this also seems weird to me. I tend to think it's the users fault...
> 
>> Anyway, I come to the conclusion that a serialized PHP array is to been
>> seen as binary data and therefore to be stored in a BLOB column.
> 
> Again, see above. It has nothing to do with the serialization of the data, the 
> main point is that during save, the template setup was not converted because 
> it is treated as binary data.
> 
> But obviously we both agree that cache_hash.content can be changed back now.

Well, yes. After a short walk in the sun I hope I have finally grasped 
what happens and why a BLOB makes some sense.

But I still think that in a proper setup you have no problems:

A single charset setup MUST have the same charset for browser and DB 
connection, anything else is wrong.

A (deprecated) multi-charset MUST use latin1 (or another iso SINGLE-BYTE 
charset) for the DB connection. In multi-charset fields only ASCII may 
be used, anything else is not valid and must be put into external files 
(which is possible for TS setup).

But go and change the field if it rids us of bug reports. Of course we 
could write a documentation for a proper Mysql/TYPO3 configuration.

Masi


More information about the TYPO3-team-core mailing list