[TYPO3] crawler questions

Thu Sep 4 08:10:26 CEST 2008

Hi Lars and Steffen,

two days ago I tried to setup the crawler for sg_glossary.
I also tried to place the TSconfig on the root page, on the glossary 
singleView... It just didn´t work as expected.. no results.
Like in Kaspers video and some webpages I could find, I always called 
"star crawling"-> get URLS and then switched an "run now". Nothing.

After removing an write error (oohps), I could see that he got all URLs 
from the Detail view of the glossary. But I just couldn´t find anything 
when I searched sth. So in the evening, I stopped working and went home 
frustrated.

The next morning I´ve called the frontend and wrote a word in the 
searchbox and *tadaa* it works.

For me, the crawler is a bit kind of a voodoo thing. But I´m going to 
try it more often to get familar with it.

Just a story i´ve written here... maybe it helps someone.

Best regards
Ralf

Steffen Kamper schrieb:
> 
> Hi Lars,
> 
> glad to see i'm not alone with this :-)
> 
>> I had to fight with the re-indexing while developing. All jobs ran 
>> fine, but I forgot that the re-indexing was set for the next hour, so 
>> the crawler did its all 5 minute job but ignored new records. If you 
>> reindex all 24hours you could wait a long time. Just make sure 
>> reindexing is left blank each time you test.
>>
> ok, i will try
> 
>> Database Fields which are indexed were also something I forgot at 
>> first. In the FAQ extension they are q and a - not title and text.
>>
>> Last is how many records are indexed. May it is just not finished ?
>>
> yes, indeed it's unclear, i have to look at source code
> 
>>> So some questions for the first:
>>>
>>> 1) Where should the record for crawling pages should be placed? 
>>> Kasper put it in a storage folder, i also tried on root page
>>
>> I thought they had to be stored on the pages where the output records 
>> are rendered.
> this is true for eg Database records, but not for pagetree. Seems that 
> it makes no difference for this record.
> 
>>
>>> 2) What's about the maximum level "3" configured in record, as my 
>>> page tree has more levels?
>>
>> You are so right :-(
>> Nice to have more levels in a furture indexed_search release.
>>
> i will hack the TCA of crawler ext to add more levels
> 
>>
>>> 3) record has time setting for queuing the record, but it crawls the 
>>> pages in each run, does it have no influence?
>>
>> As far as I see in my installations it only reindexes if something new 
>> is on the page or normal re-caching is needed. As for the load, I dont 
>> know because there aren't that many pages on my installations.
>>
> i puzzle about this reindexing. Messages in crawler log saying something 
> about reindexing, but i didn't understand the meaning here, so i have to 
> try&error
> 
>>> 4) i tried to configure crawler to use mm_forum as data table. now 
>>> the table with the posts is kind of mm-table so i can't build the 
>>> Getvars without including the thread table, is there any workaround 
>>> for such situation?
>>
>> sorry ... no advice here :-(
>>
> np - i will investigate further and post here if i found something 
> interesting about.
> 
> vg Steffen

-- 
--
Greetings
Ralf Merz

Heindl Internet AG
Tübingen , Germany
ralf.merz at heindl.de