[TYPO3-english] vge_tagcloud and proper handling of UTF-8 tags

François Suter fsu-lists at cobweb.ch
Tue Apr 12 22:29:28 CEST 2011


Hi Alex,

> But in the tagcloud section, special characters like "à ù ñ ì" in words
> like pàgina or bùsqueda are not displayed correctly.
>
> Some of them are splited at the position of this chars others are
> displayed with this famous and uggly unknown symbols .

I could reproduce the problem but only with "à" as in "pàgina". All 
other letters that you mention caused no problem, as well as many other 
characters in other language. I even tried "Jóhanna Sigurðardóttir" 
(Iceland's prime minister) and it works fine. So there's got to be 
something weird with the "à". I dug a bit deeper and the thing is that 
when "à" is converted to latin-1 (which is what PHP's regexp functions 
do - unfortunately) it converts to 2 characters: one weird letter and 
one blank. So it get's split on that blank. So this could happen to 
other characters too, but definitely not all.

> Since this is not limited to spanish, but also to german (äüö) and
> french (Çé...) and generaly to UTF-8 i'm asking me what to do?

One workaround could be to use the "extractKeywords" hook from 
vge_tagcloud. This lets you provide your own method for splitting the 
words. Inside your hook you could then convert all the strings to 
latin-1 (which should be ok assuming your site is entirely in Spanish), 
split just like the tag cloud does, then convert back to UTF-8 for 
proper display.

HTH

-- 

Francois Suter
Cobweb Development Sarl - http://www.cobweb.ch


More information about the TYPO3-english mailing list