[TYPO3-core] RFC #8264 t3editor: "+" (plus) signs are replaced by spaces

Tobias Liebig mail_news at etobi.de
Mon May 5 10:44:53 CEST 2008


Hej,

thanks for your hints.
After reading and researching i agree your points.
The must be the reason why prototype uses encodeURIComponent when
sending Ajax Requests.

But:
encodeURIComponent *always* sends in UTF-8, independent if the site
uses e.g. UTF-8 or ISO-8859 as charset.

Imaginge the following situation:
The TYPO3 BE is delievered in ISO-8859 (because it does not use
forceCharset = utf-8)

When not using the t3editor the template code is send to the php-script
by a normal POST-Request. And it's ISO-8859 encoded.
The php-script saves the code "as it is" to the database and it would be
delivered correctly after reloading the template module.

Using the t3editor, the template code is send via Ajax request to the
server. It's UTF-8 encoded, because prototype uses encodeURIComponent.
The php-script saves the code to the database "as it is".
When the code later is delivered as ISO-8859 all special chars get
broken.

The only solution i can imagine:
Before saving the Code to the database (/TCE-processing) and the Backend
does NOT use UTF-8 but the current POST-request is UTF-8 encoded (Header
content-type: ... charset=utf-8), i need to apply "utf8decode()" to all
Post-values, to transform them into the correct charset

is this correct?
any other ideas how to solve this?

regards
  tobias

Am Donnerstag, den 01.05.2008, 11:22 +0300 schrieb Dmitry Dulepov
[typo3]:
> Hi!
> 
> Martin Kutschker wrote:
> > Tobias Liebig schrieb:
> >>
> >> Problem:
> >> this issue only happens when not using UTF-8 in BE
> >> it's somehow related to 0006827 <http://bugs.typo3.org/view.php?id=6827>
> >> for this i substituted "encodeURIComponent" with "escape" when charset
> >> is not UTF-8.
> > 
> > This is bogus! While encodeURIComponent() operates on utf-8 escape() 
> > uses iso-8859-1. Both functions convert the local chracter set of the 
> > page to their "native" charsets. Their decode conterparts do likewise in 
> > the other direction.
> 
> I searched the net and I agree to Masi. This is what I found (best description) at [1]:
> ---------------------
> You’ll often see escape() used to prepare the string for use in a URL; it escapes characters like the ampersand that would otherwise result in a malformed URL. However, escape() doesn’t handle characters outside the ASCII range correctly, so the receiving script won’t be able to interpret them. You simply can’t use escape() on Unicode text.
> 
> Luckily, all recent browsers support two new JavaScript functions, encodeURIComponent() and encodeURI(). These functions are safe for UTF-8 text, encoding them with the proper escape sequence, as well as everything escape() did to make sure the text is usable in a URL. The encodeURI() function encodes entire URIs — so it leaves characters such as :?& intact. encodeURIComponent() encodes strings to be individual parameters of a URI, so it encodes all characters except ~!*()’.
> 
> In short, if you’re using escape(), use encodeURIComponent() instead.
> ---------------------
> 
> So my question to Tobias is: what happens if we do not replace encodeURIComponent with escape (or escapeplus) at all? It seems to me that this replacement is not needed at all.
> 
> [1] http://www.dangrossman.info/2007/05/25/handling-utf-8-in-javascript-php-and-non-utf8-databases/



More information about the TYPO3-team-core mailing list