[TYPO3-Solr] MetaDataExtraction with Tika and solrfal

Sebastian Schreiber me at schreibersebastian.de
Wed Jul 8 11:47:22 CEST 2015


now i see that in the version 2.0.0-dev only the metadataextractor is
active.
The textextractor service is only available in TYPO3 Version 7.1.
So does it mean, i can´t use the textextraction for TYPO3 Version 6.2?
Do i have to use another extension?

Am 08.07.2015 um 11:27 schrieb Sebastian Schreiber:
> Thanks for your answer, but i don´t understand.
> Where does the tika_content field belongs to? sys_registry or which
> table is involved?
> Can´t find any field named tika_content or any string named tika_content.
> 
> I´m using a patched version of tika version 2.0.0.dev from github
> (resolved some Namespace Issues) and solrfal in Version 2.0.0.
> 
> And how does the whole content of a pdf gets into the solr index afterwards?
> 
> 
> Am 07.07.2015 um 22:57 schrieb Jigal van Hemert:
>> Hi,
>>
>> On 07/07/2015 18:03, Sebastian Schreiber wrote:
>>> i´m using the metadataextraction with tika (Remote via Solr).
>>> The scheduler task is running well and some metadata gets updated
>>> (author, publisher).
>>> But i don´t understand where the whole content extraction i.e. for a pdf
>>> is written?
>>
>> During indexing the content is temporarily written in the tika_content
>> field and right after the record is indexed the field is emptied again.
>> solrfal and tika use a signal-slot mechanism to trigger these actions.
>>
> 
> 


-- 
Sebastian Schreiber
(Medieninformatiker B.Sc.)
(TYPO3 Certified Integrator)

Paul Nießen Straße 58
D-50969 Köln

T  0221 677 88 541
M  0176 431 05 790

Skype schreibersebastian.de

me at schreibersebastian.de
www.schreibersebastian.de

Steuernummer: 219 / 5302 / 3123


More information about the TYPO3-project-solr mailing list