[Typo3-dev] letters, word split and indexed_search, et.al.

Kasper Skårhøj kasper2004 at typo3.com
Sun Aug 8 21:10:45 CEST 2004


I think it is bloat but nevertheless a useful library. Certainly we need
something like it for indexed search. I'm just VERY afraid that it will
be ENOURMOUSLY slow for indexing pages and hence it might be a fair
requirement to say that this feature is only available for people having
some native PHP stuff that does it on utf-8 anyway.

I'm about to look at utf-8 support in indexed_search within not so long
so I will come back to it but I really hope you do some good research
till then. Thanks for working so hard on this!!

- kasper

On Sun, 2004-08-08 at 16:30, Martin T. Kutschker wrote:
> Martin T. Kutschker wrote:
> > Hi!
> > 
> > AFAIK, Typo3 needs a way for indexed search and other extensions to find 
> > out if a character (posssibly a multi-byte utf-8!) is a letter. IMHO 
> > this is needed because word splitting cannot be done by simple regexps 
> > for arbitrary single-byte charsets easily and it's simply not possible 
> > for utf-8.
> > 
> > Because I need it in a private project I'm going to write a function 
> > that will detect Latin, Cyrillic, Greek, Hebrew and Arabic letters.
> 
> I'm thinking of something like this:
> 
> function is_letter($charset,$string)
> 
> It will return 0 if the first character of the string is not a letter.
> 
> If it is a letter, it will return the number of bytes that make up the 
> character. ie 1 for single-byte chars, 2 or 3 for utf-8 (depending on 
> char), and perhaps 2 if I'll add support for Big5 and Shift-JIS.
> 
> Kasper, I'd like to do this in a Typo3 context. Is this a candidate for 
> t3lib_cs? Yes, because it will likely rely on the Unicode data table 
> already parsed there and I can image it's of general use. No. because it 
> might be considered as bloat.
> 
> Masi
> 
> _______________________________________________
> Typo3-dev mailing list
> Typo3-dev at lists.netfielders.de
> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-dev
-- 
- kasper

--------
Please notice NEW EMAIL ADDRESS for 2004!! (due to SPAM-contamination)
	
"kasper2004 at typo3.com"






More information about the TYPO3-dev mailing list