[TYPO3-core] RFC: Bug #13972: cropHTML uses faulty reg exp for HTML entities

Fri Apr 16 16:38:27 CEST 2010

Hi Jigal.

On 15.04.10 23:11, Jochen Rau wrote:
>> I couldn't find in the thread if it was so much slower to just traverse
>> the html_entity_decoded string along with the original version and skip
>> past the next ';' if the characters in both are different on the
>> position of the pointer?
>> The code already relies on that function to determine the length. This
>> way you'll end up with a cropped version of the original string and all
>> entities supported by this function are recognized.
>>
>> normal chars &agrave;vry &euml; end.
>> .............x ....x .....
>> normal chars à>>>>>>>vry ë>>>>> end.
>
> That might work. But it costs you (or me) additional 5 hrs of
> implementation, discussion, testing, fixing regressions to squeeze the
> last 0,01% of "customer satisfaction" (not "developer satisfaction") out
> of this function. Do you want to allocate that?

A suggestion how to proceed here: The main part of the reported issue in 
cropHTML is resolved by the patch Ralf provided ("the current preg_match 
currently always crops after the first semicolon"). The second part 
isn't ("... and won't recognize entites reliably"). I'd like to move the 
second part into another issue and vote only for the first. In this case 
I'd vote +1 by reading and testing.

If we don't slice this into different junks, another improvement of 
TYPO3 will be lost in the endless stream of the mailing list ...

What's you opinion?

Jochen