[Typo3-dev] letters, word split and indexed_search, et.al.
Kasper Skårhøj
kasper2004 at typo3.com
Sun Aug 8 21:10:45 CEST 2004
I think it is bloat but nevertheless a useful library. Certainly we need
something like it for indexed search. I'm just VERY afraid that it will
be ENOURMOUSLY slow for indexing pages and hence it might be a fair
requirement to say that this feature is only available for people having
some native PHP stuff that does it on utf-8 anyway.
I'm about to look at utf-8 support in indexed_search within not so long
so I will come back to it but I really hope you do some good research
till then. Thanks for working so hard on this!!
- kasper
On Sun, 2004-08-08 at 16:30, Martin T. Kutschker wrote:
> Martin T. Kutschker wrote:
> > Hi!
> >
> > AFAIK, Typo3 needs a way for indexed search and other extensions to find
> > out if a character (posssibly a multi-byte utf-8!) is a letter. IMHO
> > this is needed because word splitting cannot be done by simple regexps
> > for arbitrary single-byte charsets easily and it's simply not possible
> > for utf-8.
> >
> > Because I need it in a private project I'm going to write a function
> > that will detect Latin, Cyrillic, Greek, Hebrew and Arabic letters.
>
> I'm thinking of something like this:
>
> function is_letter($charset,$string)
>
> It will return 0 if the first character of the string is not a letter.
>
> If it is a letter, it will return the number of bytes that make up the
> character. ie 1 for single-byte chars, 2 or 3 for utf-8 (depending on
> char), and perhaps 2 if I'll add support for Big5 and Shift-JIS.
>
> Kasper, I'd like to do this in a Typo3 context. Is this a candidate for
> t3lib_cs? Yes, because it will likely rely on the Unicode data table
> already parsed there and I can image it's of general use. No. because it
> might be considered as bloat.
>
> Masi
>
> _______________________________________________
> Typo3-dev mailing list
> Typo3-dev at lists.netfielders.de
> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-dev
--
- kasper
--------
Please notice NEW EMAIL ADDRESS for 2004!! (due to SPAM-contamination)
"kasper2004 at typo3.com"
More information about the TYPO3-dev
mailing list