[TYPO3] Indexing external files with crawler

Jan Hančič jan.hancic at gmail.com
Mon Aug 6 19:31:20 CEST 2007


Ok this works like a charm!!! I had to try various combinations of disable
FE indexing/use crawler for... settings but it works.

Words really fail me in describing how happy and grateful I am!!!


-- 
lp
Jan Hančič
http://hancic.info

On 8/6/07, Diego Pino Garcia <dpino at igalia.com> wrote:
>
> Hi Jan!
>
>
> >
> > I am setting up a site with typo3 on ubuntu (both latest version).
> Ubuntu
> > is loaded as a virtual machine if that makes any difference for my
> > situation.
> > All is working except indexing of external files. Now I don't know if I
> > just don't understand how this is suppose to work, or if I
> miss-configured
> > something.
> >
> > I have followed this tutorial:
> http://wiki.typo3.org/index.php/Ext_crawler
> >
> > Now:
> > - I have installed all the programs for parsing (pdfinfo, unzip, ...)
> > - I have installed php5-cli
> > - I have setup the cron job for
> > typo3conf/ext/crawler/cli/crawler_cli.phpsh, and guessing from the log
> files
> > it is running
> > - I have created the _cli_crawler BE user
> > - I have put the TSconfig from the link above in to my root page
> > - I have created a not in menu page under the root page
> > - I have created a indexing configuration (type=external files) on the
> > above page, that points to "files/" under fileadmin (must I type
> > "fileadmin/files/" or is "files/" enough?)
> >
> > Now I have tried something: I have created a simple content and created
> a
> > link to a PDF file that is somewhere in fileadmin. If I then go to
> > Web->Info->Crawler and click refresh next to the page that the content
> is on
> > (and after that click refresh on all the entries that appear bellow that
> > page), I can find that file using search in FE (so indexing of files
> works).
>
> As far as I know, crawler do not crawls system folders on your tree for
> indexing. To check that, just:
> Info->Site Crawler.
> Click on the root node of you tree, and select Infite.
> Press Crawl URLs.
>
> That will build all the URLs for your site (based on your typoscript
> configuration). You can see then, that there is not URL built for system
> folder, so its contents are never indexed. I do not whether there is any
> option or some typoscript you may use to do crawl system folders.
>
> >
> >
> > But I can't figure out how to configure the crawler to index files under
> > "fileadmin/files/" automatically (say every day at a given hour).
> > Can somebody please help me with this? I have been struggling with this
> > for a couple of days now without much success.
> >
>
> If you are OK with making new pages and pointing to external contents
> stored on your fileadmin/files/, then you could just simply add a new cron
> task to crawl and index your contents everyday at a certain time od the day.
>
> Instead of the cli/crawler_cli.phpsh use cli_dispacth.php (this is
> preferred since Typo4.1). Calling cli_dispatch allows you to pass conf
> parameters. You set the cript to start crawling from a certain PID or
> setting crawl depth. Please, check
> http://typo3.org/documentation/document-library/extension-manuals/crawler/2.0.0/view/1/3/for more info. For instance,
>
> /typo3_src-4.1/typo3/cli_dispatch.phpsh crawler_im 3 -d 1 -proc
> tx_indexedsearch_reindex
>
> (will crawl from PID 3, one level down)
>
> Then add parameter -o exec, to process the queu right away.
>
> Set a new task at your cronttab to perform this thing everyday at midnight
> for example.
>
>
> 0 0 * * * *  /typo3_src-4.1/typo3/cli_dispatch crawler_im 3 -d 1 -proc
> tx_indexedsearch_reindex -o exec
>
> And that's all. I do not know whether this was what you were asking for, I
> hope, at least, it has brought a shed of light.
>
> Best regards,
>
> Diego
> _______________________________________________
> TYPO3-english mailing list
> TYPO3-english at lists.netfielders.de
> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-english
>


More information about the TYPO3-english mailing list