[TYPO3-english] crawler vs crawler_im and indexing configurations a bug?

Thu May 18 01:57:45 CEST 2017

Hello,

I've been struggling with indexed search and crawler for some days now 
and I have some questions.

Why is it that the crawler_im script does not process indexed search 
hooks and the crawler script does it? I know this for a fact since I am 
reviewing the source code and to me it has no logic. Do you have an 
insight on this?

The help for crawler says:

NAME:
     crawler CLI interface -- Crawling the URLs from the queue

SYNOPSIS:
     [-s] [--silent] [-ss] [-h] [--help] [--countInARun count] [--sleepTime
     milliseconds] [--sleepAfterFinish seconds]

The help for crawler_im says:

NAME:
     crawler CLI interface -- Submitting URLs to be crawled via CLI
     interface.

SYNOPSIS:
     page_id [-s] [--silent] [-ss] [-d depth] [-o mode] [-n number] [-conf
     configurationkeys]

Therefore, to me (please tell me if I am wrong, since I am really 
confused by this) if I call first crawler_im with a page_id that has an 
indexing configuration all the URLs submitted should be generated by 
that indexing configuration.  However crawler_im just calls in its code 
getPageTreeAndUrls but never calls CLI_runHooks, which from my point of 
view it should.

I write all of this because I created an indexing configuration  for the 
page tree (because I want to separate pages and records with 
defaultFreeIndexUidList) and I am following the manual in
https://docs.typo3.org/typo3cms/extensions/indexed_search/7.6/IndexingConfigurations/PeriodicIndexingWebsite/Index.html
I've noticed that I don't get the expected behavior.

Am I wrong?

Calling just crawler which does call CLI_runHooks never creates the 
structure I have specified with the indexing configurations I've made 
following exactly the indexed search manual.

May you advise on this?

Best regards,

B.