[TYPO3-core] How to treat UTF BOM?

Mon Dec 26 13:49:59 CET 2011

Hi,

while browsing through the bug tracker I stumbled over this one:
http://forge.typo3.org/issues/19708

As such things can cause serious headache, I'm trying to find a solution 
for it (aka. skip the Byte Order Mark somewhere during processing).
As I think it's insufficient to remove it while processing 
INCLUDE_TYPOSCRIPT, I tend to remove it directly in t3lib_div::getUrl().

This would also resolve http://forge.typo3.org/issues/20671 (xml2array 
doesn't respect BOM).

Before pushing, I want to ask for your feedback here.

* Only removing UTF-8 BOM (EF BB BF) as in the proposed patch is not 
enough. There are different variants of UTF-16 (big/little endian) and 
also UTF-1, -7, -32 and other charsets. Maybe mb_string could help. 
Otherwise search for each of the known BOMs? (see wikipedia for some 
(all?) possible).

* Do you think that it is correct to place it in getUrl()?
IMHO people want the file contents and not take care of such meta 
information. I don't have a clue ATM, what the result of a UTF-16 
without BOM would be.

I created a new issue http://forge.typo3.org/issues/32834 as umbrella 
for the others.

I'm not a charset expert.. so comments are welcome!

Kind regards
Steffen

-- 
Steffen Gebert
TYPO3 v4 Core Team Member
TYPO3 Server Administration Team Member

TYPO3 .... inspiring people to share!
Get involved: http://typo3.org