[Typo3-dev] EXT: indexed_search - problems & solutions (to kasper)

Rene Suthoelder t3 at 1zu6-design.de
Fri Apr 23 16:27:36 CEST 2004


[hope this is the right group]

kasper,

i currently trying to get the indexed_search to work with german umlaut
characters (ä,ö,ü, Ä, Ö, Ü), which doesn't seem to work for a lot of people.

for external documents (word files in my case):

first thing i had to do was to change in modul class
ext\indexed_search\class.indexer.php to accept dos program name (like
described here
http://typo3.org/doc.0.html?&tx_extrepmgm_pi1[extUid]=16&tx_extrepmgm_pi1[tocEl]=43&cHash=f356fa37fd)

after that, when doing - a now successful call - to catdoc on any given file
the result was scrambled when german umlaut chars were in it. i tried to
figure out why this is and ended up with a quick hack. instead of letting
catdoc determine the input&output charset, i forced it to write out
iso8859-1 instead  (with catdoc -s8859-1 -d8859-1 filename.doc).
doing so, the german words where finally indexed.

i expected the search to work now, but now the really strange things happen:
having prepared a word test document, searching vor "äpfel" didn't not find
anything, searching "Äpfel" returned a match. this error was reproduceable
only with small german umlauts.

my question:

the only reference to german umlauts i found in your source code was this
(in file \ext\indexed_search\class.indexer.php)
============
[snipp]

class tx_indexedsearch_indexer {

[snipp]

 var $convChars=array(
  "ÁÉÚÍÄËÜÖÏÆØÅ",
  "áéúíâêûôîæøå"

[snipp]
============

i'm not a programmer at all, but i think this maps capital letters to their
corresponding small letters.
as you can see, the german letters are not mapped correctly, since Ä becomes
â. i changed this, and voila: searching äpfel returned a match (and as we
all know: an apple a day keeps sorrow away!)

perhaps you / the community could verify my solution and update your
extension ?

btw: this fix also solves the problem with _normal_ site content, not only
with external documents (currently, searching small umlauts does not work!)

cheers,

rainer









More information about the TYPO3-dev mailing list