[TYPO3-dev] MySQL, search and utf8

Steffen Kamper steffen at sk-typo3.de
Tue Oct 9 14:48:11 CEST 2007


"ries van Twisk" <typo3 at rvt.dds.nl> schrieb im Newsbeitrag 
news:mailman.1087.1191932106.20721.typo3-dev at lists.netfielders.de...

On Oct 9, 2007, at 4:23 AM, Steffen Kamper wrote:

>
> "Steffen Kamper" <steffen at sk-typo3.de> schrieb im Newsbeitrag
> news:mailman.1.1191866513.10281.typo3-dev at lists.netfielders.de...
>> Hi,
>>
>> i have a non-TYPO3-question.
>>
>> I perform a search in MySQL and for example look for the word  "schön". 
>> So
>> my query looks like:
>>
>> Select .... where ... and (body like "schön%" or body like "%schön %" or
>> body like "%schön" or body = "schön")
>>
>> looking to the results i see that MySQL shows up result with the word
>> "schon" which is different. Thoug it is utf8, the letter ö is  saved in
>> utf8 as 2-Byte, so why gives MySQL me the result with o?
>>
>> And how can i provide this to get only exact matches with "schön"?
>>
>> thx, vg  Steffen
>>
>
> it seems that like is not binary safe in utf8-context.
>
> I got the right results by using LIKE BINARY, so my query works  like 
> this:
>
> SELECT * , count( * ) anz
> FROM tx_drwiki_pages
> WHERE hidden =0
> AND deleted =0
> AND (
> body LIKE binary "schön%"
> OR body LIKE binary "%schön%"
> OR body LIKE binary "%schön"
> OR body = "schön"
> )
> GROUP BY keyword
> ORDER BY anz DESC
>
> vg  Steffen
>
>
> _______________________________________________
> TYPO3-dev mailing list
> TYPO3-dev at lists.netfielders.de
> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-dev

Hey Steffen and all,

I think changing the column to binary is not the correct solution
and will lead to not finding some words under some conditions.
Remember with UTF8 each character can be 1 or more bytes!!

Steffen, can you set the column to latin_german2_ci and see what the
results are?

Ries

Hi Ries,

i didn't changed the column, i only use LIKE BINARY for a binary comparison. 
In this case the search works fine.
I can't use a on-the-fly-conversion with latin_german2_ci  (it's also not 
necessary now).

Sure changing the charset of the table to latin the normal like would work 
also, as in latin Umlauts are saved in one Byte.

And again, collation is not the point in this case - in ordering there comes 
collation in the game.

vg  Steffen








More information about the TYPO3-dev mailing list