[TYPO3-german] UTF8

Andreas Becker ab.becker at web.de
Sat Feb 26 05:15:33 CET 2011


Hier eine Beispiel my.cnf: Debian 5 lenny mit mysql 5.

Setting the default and server character set and collation to UTF-8 in your
MySQL config may not be enough if your clients (PHP etc.) request to use a
different one as MySQL will oblige. So long as your clients request the
character set you want to be used this is fine.

etc/mysql:

#
# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all mysql clients

[client]
*default-character-set = utf8*
...

[mysqld]
*# setting db server to UTF8
*
*collation_server=utf8_general_ci
character_set_server=utf8*

# On the other hand, if you want to ensure MySQL always stores
# and retrieves data in the UTF-8 charset regardless of what a
# client requests add the following directive to the [mysqld] section
# of your MySQL config file.

#skip-character-set-client-handshake


# * Basic Settings
...

[mysqldump]
...

[mysql]
*# Setting MySql System to utf8*
*default-character-set=utf8
collation_connection=utf8_general_ci*

#no-auto-rehash # faster start of mysql but no tab completition

[isamchk]
...

-----------------------------

In localconf.php deactiviere/loesche die settings die man noch zuvor
gebraucht hatte:
(TIP: teste zuerst mit deactivieren)

// UTF8
*//$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8'; // loeschen
//$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8'; //** loeschen*
$TYPO3_CONF_VARS['SYS']['t3lib_cs_convMethod'] = 'mbstring';
$TYPO3_CONF_VARS['SYS']['t3lib_cs_utils'] = 'mbstring';
*//$TYPO3_CONF_VARS['SYS']['UTF8filesystem'] = '1'; **// loeschen*

-----------------------------

in php.ini

If your scripts will contain multi-byte characters be sure to encode them in
UTF-8. Any decent text editor can do this for you (I like notepad++). In
your PHP config, set the default_charset and mbstring.internal_encoding to
UTF-8.

*; PHP's default character set is set to empty.
*; http://php.net/default-charset
;default_charset = "iso-8859-1"*
*
default_charset = "UTF-8"
mbstring.internal_encoding = UTF-8

-----------------------------

Apache

Web browsers need to know what character set to use for the documents you
send it. Your server needs to include charset=utf-8 in the Content-Type HTTP
header so clients will handle UTF-8 encoded documents correctly. There are a
variety of ways you can do this and a simple approach would be to set UTF-8
as the default character set. Here are a few ways you can do this -- add the
method of your choice to httpd.conf or your root htaccess file:


// make utf-8 the default charset for everything
AddDefaultCharset UTF-8

// or specifically for php, html, xml, javascript etc..
AddCharset UTF-8 .php
AddCharset UTF-8 .html
AddCharset UTF-8 .xml
AddCharset UTF-8 .js

-----------------------

TESTING

An example in code. From Sam Ruby’s i18n Survival Guide, he recommends using
the string Iñtërnâtiônàlizætiøn for testing. Counted with your eye, you can
see it contains 20 characters;

Iñtërnâtiônàlizætiøn
12345678901234567890

But counted with PHP‘s strlen function...

<?php
echo strlen('Iñtërnâtiônàlizætiøn');
?>

PHP will report 27 characters. That’s because the string, encoded as UTF-8,
contains multi-byte characters which PHP‘s strlen function will count as
being multiple characters.

Life gets even more interesting if you run the following2);

<?php
header('Content-Type: text/plain; charset=ISO-8859-1');

$str = 'Iñtërnâtiônàlizætiøn';

$out = '';
$pos = '';
for($i = 0, $j = 1; $i < strlen($str); $i++, $j++) {
$out .= $str[$i];
if ( $j == 10 ) $j = 0;
$pos .= $j;
}

echo $out."\n".$pos;
?>

You should see something like;

Iñtërnâtiônà lizætiøn
123456789012345678901234567

*>>>> *
read more about it here http://www.phpwact.org/php/i18n/charsets
and here:
http://www.itnewb.com/v/UTF-8-Enabled-Apache-MySQL-PHP-Markup-and-JavaScript
*
*

http://www.joelonsoftware.com/articles/Unicode.html
http://en.wikipedia.org/wiki/Category:Character_sets
http://en.wikipedia.org/wiki/Character_encoding
http://en.wikipedia.org/wiki/UTF8
*>>>>
*
*
*
I hope it helps you to get your machine running in UTF8 and also to test
your stuff.**

Andi
*
*
*
*
*
*
*
*


More information about the TYPO3-german mailing list