[TYPO3-english] mnoGoSearch indexing

Wed Mar 5 14:26:44 CET 2014

Hi Pero,

Pero Peric wrote:

> Maybe this people that are putting so much effort in things like fluid
> and space ship enterprises could come down to earth a bit and make some
> built in fast working search to replace indexed search, but it seems
> that's not so fancy.. ah.

It is not that easy to write a good search engine. On most smaller websites, 
the search is broken or very slow.
On top of that, TYPO3 CMS has a non-trivial content model, which makes 
searching a very complex topic. Our flexibility bides us in ass here. We 
know this hurts, but we cannot really do anything about it.

Why? Well, there is not predefined way on how the content is rendered on the 
website. You can use TYPO3 CMS for a fully AJAX website that spits out JSON 
content or you can create a one page site out of many pages. You can create 
a traditional site or you can use it for completely non-web publishing 
processes. The issue here is, that you cannot know (or calculate) how a 
certain record will be rendered on a website. It might just be places as 
content object (on a traditional website) or it might not even be rendered 
at all. It might show up on a different page or even on all pages, because 
it is directly references via TypoScript (RECORD).
It might not even show up at all, because some part of the website is hidden 
because not link to it is rendered.

Essentially you cannot know programmatically, how a record will show up on 
which URL. And that is what you need to know to write a search.
So how does indexed search solves this? It solves it by introducing markers 
that wrap text that should show up in search and analyzes only fully 
rendered cached! pages. Why only cached pages? Because the website can be 
completely different for logged-in users or user with a certain browser or 
visitors from a certain country or a few docent other conditions. Cached 
pages must have a finite amount of conditions that can be taken into account 
for searches as well.

How does solr solves this? They allow you to create rules for every kind of 
record. This results in a very long list of rules and still needs custom 
code for complex cases.

Therefore you can either index mostly static pages witch is almost trivial 
(indexed_search) or you can use a big solution that needs a complex (and 
expensive) setup.
Of course there are solutions in between like ke_search, but they will not 
cover every situation.

After all it boils down to what record will show up where. Therefore a 
search is as custom to a site as the template used to render those records. 
Nobody bother yet enough to write a search engine that is as flexible as 
templating approach and I am very sure that if someone did, a lot of people 
would complain that it is sooo complicated to setup.

The reason why there is not superdooper search engine is, because nobody has 
a high quality solution and the core team will not accept another half-
backed, half-working pseudo solution.
The difference to other CMS is that the content goes to a unknown number of 
transformation before it is rendered on the website. Therefore it is not 
enough to know what content is on what page to create a working search.

Best regards
-- 
Philipp Gampe – PGP-Key 0AD96065 – TYPO3 UG Bonn/Köln
Documentation – Active contributor TYPO3 CMS
TYPO3 .... inspiring people to share!