[Typo3-dev] letters, word split and indexed_search, et.al.

Martin T. Kutschker Martin-no5pam-Kutschker at blackbox.n0spam.net
Sun Aug 8 16:30:06 CEST 2004


Martin T. Kutschker wrote:
> Hi!
> 
> AFAIK, Typo3 needs a way for indexed search and other extensions to find 
> out if a character (posssibly a multi-byte utf-8!) is a letter. IMHO 
> this is needed because word splitting cannot be done by simple regexps 
> for arbitrary single-byte charsets easily and it's simply not possible 
> for utf-8.
> 
> Because I need it in a private project I'm going to write a function 
> that will detect Latin, Cyrillic, Greek, Hebrew and Arabic letters.

I'm thinking of something like this:

function is_letter($charset,$string)

It will return 0 if the first character of the string is not a letter.

If it is a letter, it will return the number of bytes that make up the 
character. ie 1 for single-byte chars, 2 or 3 for utf-8 (depending on 
char), and perhaps 2 if I'll add support for Big5 and Shift-JIS.

Kasper, I'd like to do this in a Typo3 context. Is this a candidate for 
t3lib_cs? Yes, because it will likely rely on the Unicode data table 
already parsed there and I can image it's of general use. No. because it 
might be considered as bloat.

Masi





More information about the TYPO3-dev mailing list