[TYPO3-Solr] Tika App vs Tika solr server integration
Claus Fassing
claus at fassing.eu
Fri Feb 10 10:37:04 CET 2012
Hello,
as I understand it right, the Solr server work with Tika (as a lib?) for
content extraction and this integrated version is way behind the Tika app.
Atm I use extraction from (remote) Solr server (Tika Ext. "remote"
option) and there are some pdf files where extraction return incorrect data.
e.g.
BleibeEnr -guSonksa ndale im Wochenrhythmus, bies fzüurmch Hteetreb
‚svtatn’ bfaenrge ietrsh iamlt eJnu?n“i(vgl. ‚vt’ 25/11). LeiderZ zeuit
aRbescthant, da l–le unnfatellrsw imöc hentlich – haben wir uns geirrt.
Aktu-ellste Enth‚üHllaunndge: „lismEbr lgaot:t ‘Tricks be i( vBoemtri
e2b7s.r0e7n.t2e0n1“1)
I figure out that content extraction work fine with the Tika App (jar file).
I switched Tika Ext. from remote to local and get correct results.
But I would like to know whether it would be possible either to update
the integrated Tika from Solr server, or to configure the Solr server so
that they using the Tika App jar file for content extraction.
The desired result should be using remote Solr server AND remote Tika app.
Is there anybody who know this ?
Regards Claus
More information about the TYPO3-project-solr
mailing list