[TYPO3-dev] problem:: report & analysis: 4.1.2 is coming with codepage inconsistancy
Martin Bless
m.bless at gmx.de
Sat Aug 18 17:48:09 CEST 2007
Hi Ernesto,
Ernesto Baschny wrote on Fri, 17 Aug 2007 20:47:36 +0200:
>What do you mean with "MySql-dump will contain UTF-8 errors"?
Yes, I can see that the term "error" may be misleading and needs a
clarification. I should have used another word but didn't have the
knowledge at that time.
>MySQL version are you using?
TYPO3-4.1.2, MySql-4.1.22, PHP-4.4.1, phpMyAdmin-2.6.4.-pl3.
> What is the exact "error" you get and when?
The answer is twofold: its (a) scrambled content (b) the database dump
isn't legal UTF-8.
(b) is a MySql issue and off topic here. It's a decision the MySql
devopers made. A dabase may of course contain any data. But a dump
file claiming to be UTF-8 should describe the data in legal UTF-8
form. At least that's what I think. But with MySql this isn't reality.
This needs an escaping mechanism of course. But, as I said, I learned
its different with MySql and not relevant here.
>I think it is not illegal to have latin-1 bytes in UTF-8 tables (or the
>other way around), its just a matter what you do with that data in your
>application.
Yes, it's always about semantics. Now concerning (a) and TYPO3: You
don't have to be danish to know that something must be wrong if you
find text like "lang.dk = Dette website er dynamisk genereret af TYPO3
CMS - frit tilg�ngeligt" in table static_template. Unfortunately I
erroneously took this as an indicator telling me that something is
wrong with the setup of my installation. Being mislead when searching
errors was the real problem to me.
> So I wonder where did you hit this problem?
Glad you ask! It seems to me that nowadays installations should use
UTF-8 wherever possibles. Just a "simple" UTF-8 installation of TYPO3
is all I was heading for. Here is what I experienced.
First try: created a new database, set an UTF-8 collation, set
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8'; Everything looked
fine until I found out that I was badly wrong. Database and tables
were UTF-8 but the data was not stored natively encoded but had
undergone a second conversion to UTF-8.
Second try: If found out that I had to set 'setDBinit'. So I started
again and used $TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES
utf8'.chr(10).'SET CHARACTER SET utf8'.chr(10).''; and, another try,
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;'.chr(10).'SET
CHARACTER SET utf8;'.chr(10).'SET SESSION character_set_server=utf8;';
Again at first everything looked fine. Data in tt_content was natively
UTF-8. But for instance Umlaute like 'ÄÖÜ' in constants of setup
fields of templates (sys_template) couln't be saved. I used to receive
this error:
Errors: 102: These fields are not properly updated
in the database: (constants) Probably value mismatch
with fieldtype.
At that point I was rather desparate (I'm hoping you understand) until
Karsten Dambekalns in [TYPO3-English] adviced me to ["Remove the set
character set, leave only the set names, try again. Worked for me,
multiple times."] I did and followed his advice and it works! These
two lines seem to do the trick:
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;'.chr(10) ;
It's great it works but I'm very unhappy with the situation that I
don't have a real understanding. What I currently do is more "trial
and error" praying for some kind of computer voodoo. And this is the
answer to your question: In trying to find out what's happening and
gain more understanding I hit the problem.
Besides, it's really difficult to find relevant information on how and
why to set 'setDBinit' on the net.
The open questions to me are: Given a concrete hosting situation:
- What measures can I take to find out what settings in localconf.php
I should use?
- If it works: Can I trust it will continue working?
Now back to our tables:
>The mentioned static-tables (from typo3/cms/) are almost all obsolete,
>in special those that contain 8-bit codes (translation for several
>old-school extensions).
Only two tables are non-ascii. The one in
typo3_src-4.1.2\typo3\sysext\tsconfig_help is UTF-8 and semantically
ok. The one in typo3_src-4.1.2\typo3\sysext\cms is "mostly" latin-1
with six additional unicode 'lost character' indicators. I had a look
into the 3.8.1 version. Same situatuation there.
>But you are right that in general no charset conversion is made when
>reading in ext_tables_static+adt.sql. This is not really possible,
>because we will have to know (and record) the charset for every INSERT
>statement in that file as we could have different charsets in a single
>installation (e.g. each language has its own default charset). Another
>choice would be to force this file to be UTF-8 encoded by default, but
>then we will have to know the language for each single INSERT statement
>to be able to do a conversion to the chosen charset for that specific
>installation. Both I consider non-trivial.
Hhm, "on each single INSERT statement"? I don't know the TYPO3 code
good enough to judge this. To me the options seem to be: (1) Use
ascii *.sql files only. If possible that's the way to go. (2) If ascii
isn't sufficient we need to agree on the meaning of the bytes in the
*.sql files since it will make a difference how they are imported.
Probably UTF-8 is the right choice here. (3) *.sql files could carry
an "encoding marker", let's say '-- encoding: latin-1' for example.
This is the way the Python folks mark their files.
>Best and easiest is to use UTF-8 in ext_tables_static+adt.sql and have
>people use forceCharset to UTF-8 if they want to use them.
Personally I would go for (1) and (2) and agree fully with this
statement.
> And instead of fixing the provided sysext/cms/ext_tables_static+adt.sql file, we
>should just drop those static templates "for good". :)
I won't miss them ;-)
>Cheers,
>Ernesto
Yes, have a nice day
Martin
More information about the TYPO3-dev
mailing list