[TYPO3] Crawler not working as I would expect it.

Walrick Bosch lists at globalhealingcircle.net
Thu Sep 18 14:28:37 CEST 2008


Hello,

I'n installed that Crawler extension and am now trying to get it to 
work. I use one index configuration which indexes the page tree.

I have set the following configuration on the PageTSconfig root page of 
the site:
tx_crawler.crawlerCfg.paramSets {
language = &L=[|_TABLE:pages_language_overlay;_FIELD:sys_language_uid]
language.procInstrFilter = tx_indexedsearch_reindex, 
tx_indexedsearch_crawler
language.baseUrl = http://www.globalhealingcircle.net/
}

I'm not sure if this is enough/correct.

But tha main thing is, when I look at the crawler log after a couple or 
runs, I'm getting a lot of lines like the following, but just behind the 
root page of the site.

3542  18-09-08 14:06:16  18-09-08 14:06:16  OK  1 [Index Cfg UID#1] 
128152761

The lines for the other pages stay empty at first. The after a while 
they start getting lines like:

Agenda  3570  18-09-08 14:06:30  -  .. 
http://www.globalhealingcircle.net/index.php?id=1706 
tx_indexedsearch_reindex; tx_indexedsearch_crawler  0

Is it normal to get so many lines behind the root page without a full URL?
I notice that the lines behind the other pages have the full URL, but 
instead of [Index Cfg UID#1 they get tx_indexedsearch_reindex; 
tx_indexedsearch_crawler

-----

Also our hosting provider has set the cron job as follows:
*/10 * * * * username nice -n+19 php -q /...../cli_dispatch.phpsh 
crawler >/dev/null 2>/dev/null

With the right path and username of course. But if I look at the CLI 
Status page nothing seems to happen. It I just enter 
"/...../cli_dispatch.phpsh crawler" using SSH it works fine. (The user 
_cli_lowlevel exists.)

Any idea why?

I'd be grateful for any help.

Regards,

Walrick

-- 
webmaster Global Healing Circle
www.globalhealingcircle.net

-- 
webmaster Global Healing Circle
www.globalhealingcircle.net


More information about the TYPO3-english mailing list