[TYPO3-english] Crawler and external documents

Michael Miousse mmiousse at infoglobe.ca
Wed Jan 28 14:07:18 CET 2009


Le Wed, 28 Jan 2009 08:26:51 +0000, Claudio Strizzolo a écrit :

> Hi all
> I'm trying to set up the crawler extension in order to index all the
> pages in the site and the external documents (/fileadmin/...) linked by
> anchors in the pages.
> I read some documentation, included http://wiki.typo3.org/index.php/
> Ext_crawler and almost everything works: the pages are correctly
> indexed, and the external documents are recognized. In the Crawler Log
> they are listed in separate rows under the page which points to them.
> However, their status is ".." and their contents are not indexed. If I
> click on the "Read" icon (it looks more like a reload icon, imho) the
> content is correctly indexed and the status becomes "OK", but I could
> not find a way to get this automatically through the crawler. I have a
> huge number of documents linked in this way, therefore I would like to
> index them without having to click on the Read icon for each of them.
> Is there a way to get this? Probably I missed something stupid in the
> documentation, but I'm puzzled trying to figure it out. This is the TS
> config in the root page of the site:
> 
> tx_crawler.crawlerCfg.paramSets {
>   whole_site =
>   whole_site {
>     cHash = 1
>     procInstrFilter = tx_indexedsearch_reindex, tx_indexedsearch_crawler
>     baseUrl = http://www.example.com/
>   }
>   language = &L=[|_TABLE:pages_language_overlay;_FIELD:sys_language_uid]
>   language {
>     procInstrFilter =tx_indexedsearch_reindex, tx_indexedsearch_crawler
>     baseUrl = http://www.example.com/
>   }
>   tt_news = &tx_ttnews[tt_news]=[_TABLE:tt_news;_PID:280] tt_news {
>     procInstrFilter = tx_indexedsearch_reindex, tx_cachemgm_recache
>     cHash = 1
>     pidsOnly = 301
>     baseUrl = http://www.example.com/
>   }
> }
> 
> This is how I run the crawler from the command line:
> 
> typo3/cli_dispatch.phpsh crawler_im 34 -d 99 -proc
> tx_indexedsearch_reindex,tx_indexedsearch_crawler -n 2000 -o exec
> 
> Thanks in advance
> 
> Claudio

is the option Use "crawler" extension to index ex...
[useCrawlerForExternalFiles] activate in indexedsearch configuration


More information about the TYPO3-english mailing list