[TYPO3-english] contagged Extension has error while parsing joined words (words joined with dashes)

Jochen Rau j.rau at web.de
Tue Mar 10 09:54:33 CET 2009


Hi Parakash,

> In the Content parser and tagger (Glossary) contagged extension there 
> seems to be some sort of error while parsing joined words (word joined 
> using dashes).
> This is clearly noticeable especially when second word contains special 
> characters such as ( ê, à, u', é, etc...)
> 
> For example consider the word " elle-même " the term is defined as 
> "elle" with a link to example.com then the link is getting rendered as 
> follows:
> 
> <dfn><a target="_top" href="http://www.example.com">Elle-m</a></dfn>ême
> 
> I doubt this could have something related with the preg_match() used in 
> getPositions() function of class.tx_contagged.php.
> 
> What could be the problem? Anyone?

I have uploaded contagged v0.2.1 to the TER (should be availablew in a 
few hours). It improves the handling of UTF-8 in combined words.

Don't forget to activate UTF-8 support by adding "u" to the Regular 
Expression Modifier in the TS constants:

contagged.modifier = Uisu <--

UTF-8 handling was deactivated by default because some old versions of 
PHP used on shared hosting do not have the necessary libraries activated.

Cheers
Jochen


More information about the TYPO3-english mailing list