[TYPO3] Crawler not working as I would expect it.

Walrick Bosch lists at globalhealingcircle.net
Fri Sep 19 11:33:58 CEST 2008


Hello again,

Forget about the first part, I discovered that that behaviour is as 
expected.

However if someone were able to determine what is wrong with the cronjob 
my provides set, tthat would be helpful.

Again the cronjob was:
*/10 * * * * username nice -n+19 php -q /...../cli_dispatch.phpsh 
crawler >/dev/null 2>/dev/null

I suppose it's either something with the "username nice -n+19 php" part 
or with the ">/dev/null 2>/dev/null" part of the line. These parts were 
added by the provider for some purpose.

Btw, would anybody be able to tell me if a 10 minute cronjob would be 
enough to daily index a 10000+ pages pagetree? They told me they 
couldn't set the internal lower.

Regards,

Walrick

----

Walrick Bosch wrote:
> Hello,
> 
> I'n installed that Crawler extension and am now trying to get it to 
> work. I use one index configuration which indexes the page tree.
> 
> I have set the following configuration on the PageTSconfig root page of 
> the site:
> tx_crawler.crawlerCfg.paramSets {
> language = &L=[|_TABLE:pages_language_overlay;_FIELD:sys_language_uid]
> language.procInstrFilter = tx_indexedsearch_reindex, 
> tx_indexedsearch_crawler
> language.baseUrl = http://www.globalhealingcircle.net/
> }
> 
> I'm not sure if this is enough/correct.
> 
> But tha main thing is, when I look at the crawler log after a couple or 
> runs, I'm getting a lot of lines like the following, but just behind the 
> root page of the site.
> 
> 3542  18-09-08 14:06:16  18-09-08 14:06:16  OK  1 [Index Cfg UID#1] 
> 128152761
> 
> The lines for the other pages stay empty at first. The after a while 
> they start getting lines like:
> 
> Agenda  3570  18-09-08 14:06:30  -  .. 
> http://www.globalhealingcircle.net/index.php?id=1706 
> tx_indexedsearch_reindex; tx_indexedsearch_crawler  0
> 
> Is it normal to get so many lines behind the root page without a full URL?
> I notice that the lines behind the other pages have the full URL, but 
> instead of [Index Cfg UID#1 they get tx_indexedsearch_reindex; 
> tx_indexedsearch_crawler
> 
> -----
> 
> Also our hosting provider has set the cron job as follows:
> */10 * * * * username nice -n+19 php -q /...../cli_dispatch.phpsh 
> crawler >/dev/null 2>/dev/null
> 
> With the right path and username of course. But if I look at the CLI 
> Status page nothing seems to happen. It I just enter 
> "/...../cli_dispatch.phpsh crawler" using SSH it works fine. (The user 
> _cli_lowlevel exists.)
> 
> Any idea why?
> 
> I'd be grateful for any help.
> 
> Regards,
> 
> Walrick
> 


-- 
webmaster Global Healing Circle
www.globalhealingcircle.net


More information about the TYPO3-english mailing list