[TYPO3-core] How to treat UTF BOM?
Steffen Gebert
steffen.gebert at typo3.org
Mon Dec 26 13:49:59 CET 2011
Hi,
while browsing through the bug tracker I stumbled over this one:
http://forge.typo3.org/issues/19708
As such things can cause serious headache, I'm trying to find a solution
for it (aka. skip the Byte Order Mark somewhere during processing).
As I think it's insufficient to remove it while processing
INCLUDE_TYPOSCRIPT, I tend to remove it directly in t3lib_div::getUrl().
This would also resolve http://forge.typo3.org/issues/20671 (xml2array
doesn't respect BOM).
Before pushing, I want to ask for your feedback here.
* Only removing UTF-8 BOM (EF BB BF) as in the proposed patch is not
enough. There are different variants of UTF-16 (big/little endian) and
also UTF-1, -7, -32 and other charsets. Maybe mb_string could help.
Otherwise search for each of the known BOMs? (see wikipedia for some
(all?) possible).
* Do you think that it is correct to place it in getUrl()?
IMHO people want the file contents and not take care of such meta
information. I don't have a clue ATM, what the result of a UTF-16
without BOM would be.
I created a new issue http://forge.typo3.org/issues/32834 as umbrella
for the others.
I'm not a charset expert.. so comments are welcome!
Kind regards
Steffen
--
Steffen Gebert
TYPO3 v4 Core Team Member
TYPO3 Server Administration Team Member
TYPO3 .... inspiring people to share!
Get involved: http://typo3.org
More information about the TYPO3-team-core
mailing list