[Typo3-dev] indexed_search - Performance Issue Questions

Kasper Skårhøj kasper2005 at typo3.com
Mon Nov 14 15:12:04 CET 2005


I think the indexing part of Indexed Search is fine. It could be faster in 
terms of splitting words, but this should only take some hardcore programmer 
to improve (notice the recent speed-reduction is due to its support for UTF-8 
so a trivial preg_split-thingy will not work...)

The searching part is more tricky. First of all it will traverse the page tree 
and use page IDs for the look ups. With thousands of pages this becomes slow. 
Can be disabled I think. On typo3.org we had a VERY slow search. Why? Well, 
because we allowed to search for "part of word" which generated too big 
result tables. Simply setting it to "distinct" word speeded it up some 100 
times. I don't know how to implement a fast "part-of-word" search but surely 
some wise guy out there can do that! 
What we need: Someone to tune the SQL part of searching for indexed search!

Generally, what I believe is done well enough is the indexing architecture of 
word-/rel-/phash table, all we need is to improve the facilities that 
searches them!

- kasper



On Monday 14 November 2005 15:36, Ries van Twisk wrote:
> Kasper Skårhøj wrote:
> >Hi Georg and Christoph,
> >
> >>- No simple way to really block indexing of access-restricted pages (so
> >>not even page titles are shown if not logged in)
> >
> >Sounds like something that should not be too hard to implement for an
> > access restricted page!
> >
> >>- on some sites, indexing just doesn't work (for no apparent reason like
> >>the usual "no_cache" problems...)
> >
> >If you use the Admin panel with the TypoScript part open you will see the
> >reasons for not indexing is shown. There are always a reason.
> >
> >>mnoGoSearch or DataParkSearch (mnoGo fork) seem to be the only OS
> >>search/indexing engines that are capable of indexing UTF-8 content,
> >>ht:dig fails because of this.
> >
> >Indexed Search is completely UTF-8 compliant.
> >
> >
> >
> >If the two reasons above is why you want another engine I think its not
> > the best arguments. Personally, I would rather like to see someone work
> > actively on tuning indexed search's performance.
> >
> >
> >- kasper
>
> Fully agreed on that.
> Still I think that indexed search can never scale high if a site is
> thousands of pages and/or ten-thousands of documents.
> In the past I have been implementing mnoGoSearch a couple of times. At
> the most i think I have been indexing around
> 80GB worth of documents (intranet inclusive office documents). On a
> simple server (1Gb, 1Ghz IDE disk, PostgreSQL) MnoGoSeach
> could always give me the right documents in under a second, second
> searches with simular search words was often much faster.
> Also I found MnoGoSearch quite smart in finding the right document and
> order...
>
> BUT MnoGoSearch is more differcult to setup so it would be nice to have
> a good alternative for big sites who are willing to take
> time and effort to set it up..
> Am I right that Indexed search can scale up to around 10.000 pages on
> normal hardware (means no clusters,
> no seperation between DB and web server etc...)
>
> Also, MnoGoSearch is payware for windows, but free for nix*. But I
> assume that nost serious websites run on nix* hardware...
>
> Ries
>
>
>
> _______________________________________________
> Typo3-dev mailing list
> Typo3-dev at lists.netfielders.de
> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-dev

-- 
- kasper

-----------------
Think future, not feature




More information about the TYPO3-dev mailing list