[Typo3-dev] letters, word split and indexed_search, et.al.
Martin T. Kutschker
Martin-no5pam-Kutschker at blackbox.n0spam.net
Sun Aug 8 16:30:06 CEST 2004
Martin T. Kutschker wrote:
> Hi!
>
> AFAIK, Typo3 needs a way for indexed search and other extensions to find
> out if a character (posssibly a multi-byte utf-8!) is a letter. IMHO
> this is needed because word splitting cannot be done by simple regexps
> for arbitrary single-byte charsets easily and it's simply not possible
> for utf-8.
>
> Because I need it in a private project I'm going to write a function
> that will detect Latin, Cyrillic, Greek, Hebrew and Arabic letters.
I'm thinking of something like this:
function is_letter($charset,$string)
It will return 0 if the first character of the string is not a letter.
If it is a letter, it will return the number of bytes that make up the
character. ie 1 for single-byte chars, 2 or 3 for utf-8 (depending on
char), and perhaps 2 if I'll add support for Big5 and Shift-JIS.
Kasper, I'd like to do this in a Typo3 context. Is this a candidate for
t3lib_cs? Yes, because it will likely rely on the Unicode data table
already parsed there and I can image it's of general use. No. because it
might be considered as bloat.
Masi
More information about the TYPO3-dev
mailing list