[TYPO3-Solr] Tika App vs Tika solr server integration

Claus Fassing claus at fassing.eu
Fri Feb 10 10:37:04 CET 2012


Hello,

as I understand it right, the Solr server work with Tika (as a lib?) for 
content extraction and this integrated version is way behind the Tika app.

Atm I use extraction from (remote) Solr server (Tika Ext. "remote" 
option) and there are some pdf files where extraction return incorrect data.
e.g.
BleibeEnr -guSonksa ndale im Wochenrhythmus, bies fzüurmch Hteetreb 
‚svtatn’ bfaenrge ietrsh iamlt eJnu?n“i(vgl. ‚vt’ 25/11). LeiderZ zeuit 
aRbescthant, da l–le unnfatellrsw imöc hentlich – haben wir uns geirrt. 
Aktu-ellste Enth‚üHllaunndge: „lismEbr lgaot:t ‘Tricks be i( vBoemtri 
e2b7s.r0e7n.t2e0n1“1)

I figure out that content extraction work fine with the Tika App (jar file).
I switched Tika Ext. from remote to local and get correct results.

But I would like to know whether it would be possible either to update 
the integrated Tika from Solr server, or to configure the Solr server so 
that they using the Tika App jar file for content extraction.

The desired result should be using remote Solr server AND remote Tika app.

Is there anybody who know this ?

Regards Claus


More information about the TYPO3-project-solr mailing list