[TYPO3-Solr] MetaDataExtraction with Tika and solrfal
Sebastian Schreiber
me at schreibersebastian.de
Wed Jul 8 11:27:10 CEST 2015
Thanks for your answer, but i don´t understand.
Where does the tika_content field belongs to? sys_registry or which
table is involved?
Can´t find any field named tika_content or any string named tika_content.
I´m using a patched version of tika version 2.0.0.dev from github
(resolved some Namespace Issues) and solrfal in Version 2.0.0.
And how does the whole content of a pdf gets into the solr index afterwards?
Am 07.07.2015 um 22:57 schrieb Jigal van Hemert:
> Hi,
>
> On 07/07/2015 18:03, Sebastian Schreiber wrote:
>> i´m using the metadataextraction with tika (Remote via Solr).
>> The scheduler task is running well and some metadata gets updated
>> (author, publisher).
>> But i don´t understand where the whole content extraction i.e. for a pdf
>> is written?
>
> During indexing the content is temporarily written in the tika_content
> field and right after the record is indexed the field is emptied again.
> solrfal and tika use a signal-slot mechanism to trigger these actions.
>
--
Sebastian Schreiber
(Medieninformatiker B.Sc.)
(TYPO3 Certified Integrator)
Paul Nießen Straße 58
D-50969 Köln
T 0221 677 88 541
M 0176 431 05 790
Skype schreibersebastian.de
me at schreibersebastian.de
www.schreibersebastian.de
Steuernummer: 219 / 5302 / 3123
More information about the TYPO3-project-solr
mailing list