[TYPO3-Solr] MetaDataExtraction with Tika and solrfal

Sebastian Schreiber me at schreibersebastian.de
Wed Jul 8 11:27:10 CEST 2015


Thanks for your answer, but i don´t understand.
Where does the tika_content field belongs to? sys_registry or which
table is involved?
Can´t find any field named tika_content or any string named tika_content.

I´m using a patched version of tika version 2.0.0.dev from github
(resolved some Namespace Issues) and solrfal in Version 2.0.0.

And how does the whole content of a pdf gets into the solr index afterwards?


Am 07.07.2015 um 22:57 schrieb Jigal van Hemert:
> Hi,
> 
> On 07/07/2015 18:03, Sebastian Schreiber wrote:
>> i´m using the metadataextraction with tika (Remote via Solr).
>> The scheduler task is running well and some metadata gets updated
>> (author, publisher).
>> But i don´t understand where the whole content extraction i.e. for a pdf
>> is written?
> 
> During indexing the content is temporarily written in the tika_content
> field and right after the record is indexed the field is emptied again.
> solrfal and tika use a signal-slot mechanism to trigger these actions.
> 


-- 
Sebastian Schreiber
(Medieninformatiker B.Sc.)
(TYPO3 Certified Integrator)

Paul Nießen Straße 58
D-50969 Köln

T  0221 677 88 541
M  0176 431 05 790

Skype schreibersebastian.de

me at schreibersebastian.de
www.schreibersebastian.de

Steuernummer: 219 / 5302 / 3123


More information about the TYPO3-project-solr mailing list