[Typo3-dev] letters, word split and indexed_search, et.al.
Martin T. Kutschker
Martin-no5pam-Kutschker at blackbox.n0spam.net
Mon Aug 9 15:19:39 CEST 2004
Kasper Skårhøj wrote:
> I think it is bloat but nevertheless a useful library. Certainly we need
> something like it for indexed search. I'm just VERY afraid that it will
> be ENOURMOUSLY slow for indexing pages and hence it might be a fair
> requirement to say that this feature is only available for people having
> some native PHP stuff that does it on utf-8 anyway.
Any C implementation should be faster, but I'm not sure if regexps are
the fastest way to do it (because the regexp syntax has to be parsed as
well).
For the search you wouldn't need my proposed word splitting function. I
don't know how it's done in indexed_search right now, but a possibly
usefull function is this:
array get_word($charset,$string,$word,$start)
Returns an array containing the word and the starting and end point
within the given string of the *next* word.
> I'm about to look at utf-8 support in indexed_search within not so long
> so I will come back to it but I really hope you do some good research
> till then.
I'll have a look at a possible implementation. Huge arrays may be slow,
but perhaps this can be resolved by using ranges instead (ie from code
point x1 to y1 its letter and from x2 to y2 again, etc).
> Thanks for working so hard on this!!
You're welcome.
Masi
More information about the TYPO3-dev
mailing list