[TYPO3-Solr] Indexing external documents

Ingo Renner ingo at typo3.org
Mon Mar 15 11:52:54 CET 2010

Thomas Hempel wrote:

Hi Thomas,

> first of all, thanks to all developers (I guess Ingo in particular) for
> the hard work and this nice piece of software!


> I just get whole thing up and running in about a day which is pretty
> good in my eyes if you take into account that I have zero experience
> with all that fancy Java stuff.

nice! Glad this worked out for you.

> Anyway, after indexing the first few pages I asked myself if indexing of
> external documents (files like PDF etc.) is possible right now.
> I didn't find any information on that. All I see is, that the page get
> indexed by solr but not the included referenced pdf documents.

As Olivier already mentioned we're already working on it and hope to 
have it ready by the end of March. The version in TER / Forge can't do 
this yet and won't until somewhen later this year.
Solr will find files linked on a page and index them as separate 
documents with references to the linking document.

To actually index files (extract content) we developed EXT:tika 
(completely available from Forge). Tika can either use a local Tika CLI 
app or a remote Solr server with Tika integrated (Solr Cell).

In addition to using Tika for Solr, you can use EXT:tika for EXT:dam, too!


Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2

More information about the TYPO3-project-solr mailing list