[TYPO3-core] RFC #11979: TCEforms suggest doesn't find everything on large sites and is slow

Sun Sep 20 17:31:37 CEST 2009

Hi Steffen,

Steffen Gebert schrieb:
> Recursion is totally useless, when we always query SQL with "LIMIT 0,50"
> - every time the same 50 records are retrieved.

I also had this fixed already, but I had no time to send it to the
list... Seems we had some weird bugs appearing over time ;)

> Now search works, but still a bit inefficient (waiting for Andreas'
> improvements first).

I think it will take some time to get my patch ready. The problem is
that we need different search strategies for different tables, because
data is distributed differently for them. Some extreme examples:

* pages are usually scattered around all pages (eh, didn't expect this,
  right? ;))
* tt_content resides on almost all pages
* glossary entries etc. will be used on only a few pages
* tx_dam records are stored on only one page
* sys_template records will be on a small number of pages

For all data that is stored on only a few pages, the approach Joey
proposed would be best. For other data, especially pages and tt_content,
we should modify the approach the default receiver takes right now:
greedily search the database for records and check access to them. We
could also use the data we gather there to slowly warm up a cache, i.e.,
when we have checked a page for access, we can store the data. But first
we search for results without limiting the pids the records should be
taken from.

So on the one hand, we have a search that first broadly searches for
accessible pages and then does an in-depth-search for matching records
on these pages (Joey's approach).
On the other hand there's the "classical" approach of greedily selecting
records and checking access to them.

I also think this has to be configurable - users with a setup like yours
(where only a few pages are accessible for each user) might require a
different search strategy than sites where all users are able to at
least see all records.

So I propose the introduction of different search classes (I'm not sure
about the final naming) and configuration options to be able to change
them on a per-table and per-field basis (as is the case with other
suggest options anyways).
These search classes will be injected [1] by the main suggest class
while creating the receiver classes.
I suppose the naming will reflect the search strategy they use, so we
get something like GreedySelector for the default strategy we have right
now and BreadthFirstSelector for the strategy Joey has proposed. [2]

These search implementations should be extendable, so you could provide
your own "searcher" if your installation needs special handling.

Another thing is to remove the recursive iteration. We should do this
anyway because recursion tends to take much more time than just doing a
carefully developed while routine. Also the problem we try to solve is
not really a recursive one, as would be if we tried to
divide-and-conquer a large amount of data, but instead we greedily
search a large dataset.

Regards
Andreas

[1] Yes, we get a poor man's dependency injection here! ;-)
[2] I hope I applied the theory of different types of search correctly,
    if not, anybody who's more in the topic, please correct me.