[TYPO3] crawler questions
Ralf Merz
ralf.merz at heindl.de
Thu Sep 4 08:10:26 CEST 2008
Hi Lars and Steffen,
two days ago I tried to setup the crawler for sg_glossary.
I also tried to place the TSconfig on the root page, on the glossary
singleView... It just didn´t work as expected.. no results.
Like in Kaspers video and some webpages I could find, I always called
"star crawling"-> get URLS and then switched an "run now". Nothing.
After removing an write error (oohps), I could see that he got all URLs
from the Detail view of the glossary. But I just couldn´t find anything
when I searched sth. So in the evening, I stopped working and went home
frustrated.
The next morning I´ve called the frontend and wrote a word in the
searchbox and *tadaa* it works.
For me, the crawler is a bit kind of a voodoo thing. But I´m going to
try it more often to get familar with it.
Just a story i´ve written here... maybe it helps someone.
Best regards
Ralf
Steffen Kamper schrieb:
>
> Hi Lars,
>
> glad to see i'm not alone with this :-)
>
>> I had to fight with the re-indexing while developing. All jobs ran
>> fine, but I forgot that the re-indexing was set for the next hour, so
>> the crawler did its all 5 minute job but ignored new records. If you
>> reindex all 24hours you could wait a long time. Just make sure
>> reindexing is left blank each time you test.
>>
> ok, i will try
>
>> Database Fields which are indexed were also something I forgot at
>> first. In the FAQ extension they are q and a - not title and text.
>>
>> Last is how many records are indexed. May it is just not finished ?
>>
> yes, indeed it's unclear, i have to look at source code
>
>>> So some questions for the first:
>>>
>>> 1) Where should the record for crawling pages should be placed?
>>> Kasper put it in a storage folder, i also tried on root page
>>
>> I thought they had to be stored on the pages where the output records
>> are rendered.
> this is true for eg Database records, but not for pagetree. Seems that
> it makes no difference for this record.
>
>>
>>> 2) What's about the maximum level "3" configured in record, as my
>>> page tree has more levels?
>>
>> You are so right :-(
>> Nice to have more levels in a furture indexed_search release.
>>
> i will hack the TCA of crawler ext to add more levels
>
>>
>>> 3) record has time setting for queuing the record, but it crawls the
>>> pages in each run, does it have no influence?
>>
>> As far as I see in my installations it only reindexes if something new
>> is on the page or normal re-caching is needed. As for the load, I dont
>> know because there aren't that many pages on my installations.
>>
> i puzzle about this reindexing. Messages in crawler log saying something
> about reindexing, but i didn't understand the meaning here, so i have to
> try&error
>
>>> 4) i tried to configure crawler to use mm_forum as data table. now
>>> the table with the posts is kind of mm-table so i can't build the
>>> Getvars without including the thread table, is there any workaround
>>> for such situation?
>>
>> sorry ... no advice here :-(
>>
> np - i will investigate further and post here if i found something
> interesting about.
>
> vg Steffen
--
--
Greetings
Ralf Merz
Heindl Internet AG
Tübingen , Germany
ralf.merz at heindl.de
More information about the TYPO3-english
mailing list