[TYPO3-core] RFC: Fix indexing of files that contain specialcharacters

Martin Kutschker Martin.Kutschker at blackbox.net
Fri Jan 27 13:46:01 CET 2006


Michael Stucki <michael at typo3.org> writes on 
Fri, 27 Jan 2006 10:49:51 +0100 (MET):

> This is a NOT CVS patch request but a request for information!
> 
> Problem:
> When Indexed Search indexes files from an ISO-8859-1 filesystem, it
> treats the 
> filenames as they were UTF-8. The problem shows up when looking at the
> Info 
> module of indexed records, or when a file has no page title and thus
> the filename is used instead.
> 
> Possible solution:
> utf8_encode the filename (see attached patch). The problem for this is
> that I 
> don't know if this still works if the filesystem was already UTF-8!
> How can one find this out?


I don't know if it's possible to detect if the file system is UTF-8 (with PHP). I can imagign that via mounts you can have access to all kind of file systems which different charsets (FAT, NTFS, ext3, NFS, etc).

You can try to detect if the file name in question is UTF-8, by testing if it only contains valid UTF-8 sequences.

But are iso-8859-1 (or rather windows-1252) and UTF-8 the only choices? I have never used a "Russian" computer, so I cannot say what's used there. I reckon the 8-bit area is filled with whatever charset is set up for the user - which can be anything.

Masi 



More information about the TYPO3-team-core mailing list