[Typo3] htdig --> Typo3 indexed search

christian reiter cr at n-o-s-p-a-m-cxd.de
Tue Aug 16 12:02:27 CEST 2005


Hi Olivier,
I won´t be at the tycon but if i can deliver any input I´ll be glad to help.
I wouldn´t rate myself as a hardcore Typo3 core developer though, mostly I
am hacking around a bit when some feature is needed. For instance I know
practically nothing about how the indexed search extension works and have
only begun looking at it in response to this required feature.

In the case of the stopword, I see a possible issue with the concept, that
stop words seem to be assigned manually and individually after they have
been stored in the index. (The spurce code comments in 2.1.3. speak of
stopword checkboxes)

This would mean,

a) deleting data from  the index tables causes loss of all stopword
configuration - which should be independent ( a stopword list, such as many
utilities have - MySQL itself e.g. has a built-in stopword list for its
fulltext search, HtDig comes with preconfigured lists etc)

b) the stopwords are probably still represented in the index_fulltext table.
So they would still be found by searches taht do not look for an individual
whole word.

Ideally a stopword list should be preconfigurable, and stopwords shouldn´t
enter the index at all. The stopword list could be a file which is pointed
to by a template setup entry. For a general solution language needs to be
taken into account...

'Stopword' shouldn´t be found by searching either for 'stopword' as whole
word, or searching for  'topwor' as part of word unless
'[space]topwor[space]' really exists as its own word, or  searching for the
literal combination ' "a stopword" '.

Of course such a feature needs to be configurable to be disabled, as a
majority of people might not consider it important and also the necessity of
checking against stopwords means extra computation time when a page is
indexed. As long as indexing happens during the caching of a page, this
would increase the rendering time for a "fresh" page in proportion to the
amount of the stopwords which may be unacceptable. If  (I think this is
intended by way of "crawler"?) indexing is separated from normal FE caching
and done by a dedicated indexer this problem would disappear.

greetings,

christian

"Olivier Dobberkau" <olivier.dobberkau at dkd.de> schrieb im Newsbeitrag
news:mailman.1.1124127357.13074.typo3-english at lists.netfielders.de...
> christian reiter wrote:
> > Hello Olivier,
> >
> > I was working with 1.2.8 and have the features I need, but the new
> > version brings its own DB colum called "is_stopword" so obviously
> > support for this feature is at least in development. I will try to
> > find out more. It would not make sense to develop a patch when this
> > feature is already being implemented, or already finished but just
> > not in the manual...
>
> hello christian.
>
> i remember that kasper has sent some months a go a mail to me.
> he had collected a set of features to be implemented in indexed search.
>
> my sugestion would be to coordinate the efforts here and dispatch a
> roadmap after tycon3.
>
> will you be there?
>
> greetings,
>
> olivier
>
>





More information about the TYPO3-english mailing list