[TYPO3-core] RFC: Fix indexing of files that containspecialcharacters
Martin Kutschker
Martin.Kutschker at blackbox.net
Fri Jan 27 20:46:45 CET 2006
Michael Stucki <michael at typo3.org> writes on
Fri, 27 Jan 2006 18:56:27 +0100 (MET):
> Martin Kutschker wrote:
>
>
> > I don't know if it's possible to detect if the file system is UTF-8
> > (with
> > PHP). I can imagign that via mounts you can have access to all kind
> > of file systems which different charsets (FAT, NTFS, ext3, NFS,
> > etc).
> >
> > You can try to detect if the file name in question is UTF-8, by
> > testing if
> > it only contains valid UTF-8 sequences.
>
> What I noticed is that the fileadmin module displays the names always
> correctly. Since that module does no conversion, the solution is
> probably really to treat it as ISO-8859-1 and utf8_encode() it.
Well, it is best guess for all West eruopean languages. Maybe it makes sense to create something like forceCharset: fileSystemCharset. Default is windows-1252 (to make Windows users happy), but can be set to anything.
fileSystemCharset is used whenever a file is processed the way you describe. Still, I suggest a test for UTF-8 before doing the conversion to avoid douple encoding.
I'm curious where you expect non-ASCII filenames in a TYPO3 context. TYPO3 ASCII-fies everything.
> When reading the PHP documentation, I've noticed that utf8_encode
> _always_
> encodes the string from ISO-8859-1. Is this really the true? If so,
> how are other countries (non-latin1) utf8-encoding their data?
iconv, recode, mbstring - in TYPO3 use t3lib_cs. But of course you can only do this if you know the charset.
> I'm curious why PHP seems not to care about them?!
<rant>Internet is America centric, second comes Western Europe. It's only ignorance and lazyness. Remember PHP is essentially home-brewn. It has no concept, design or at least s sensible naming scheme. How can you expect true and meaningful i18n?</rant>
Masi
More information about the TYPO3-team-core
mailing list