[TYPO3-core] RFC: Bug #13972: cropHTML uses faulty reg exp for HTML entities

Jochen Rau jochen.rau at typoplanet.de
Thu Apr 15 23:11:01 CEST 2010

Hi Jigal.

On 15.04.10 22:23, Jigal van Hemert wrote:
> Jochen Rau wrote:
>> On 15.04.10 20:30, Jigal van Hemert wrote:
>>> - valid entities can be longer than 7 characters (e.g. ϑ) [1]
>> As I implemented the first version of cropHTML, I read the spec linked
>> above, too. I made a list of entitites but must have overseen the only
>> entity name in the list having a length > 7 ;-)
> Seems I had a bright moment spotting that one in the list :-D

I'm relieved to hear that. I thought you were nitpicking. ;-)

>>> - not everything between & and ; is a valid entity
>> That's true. But the alternative is to build a list of entity names
>> (at least the ones specified in [1]) and make a preg_match only with
>> these ones. But someone can add some new entities which is allowed to
>> in every SGML compliant language. What's next?
> (...)
>> BTW If you are interested in the original thread, take some coffee and
>> crawl for "[TYPO3-core] RFC #7984: Bug: stdWrap.crop now closes opened
>> tags and counts chars correctly" (starting at 2008-04-03). It's funny
>> to read but it will take it's time ;-)
> I poured myself a new diet Coke and briefly scanned the thread.
> First there was resistance about using html_entity_decode() and later it
> was used to determine the length of string with entities.
> I couldn't find in the thread if it was so much slower to just traverse
> the html_entity_decoded string along with the original version and skip
> past the next ';' if the characters in both are different on the
> position of the pointer?
> The code already relies on that function to determine the length. This
> way you'll end up with a cropped version of the original string and all
> entities supported by this function are recognized.
> normal chars àvry ë end.
> .............x ....x .....
> normal chars à>>>>>>>vry ë>>>>> end.

That might work. But it costs you (or me) additional 5 hrs of 
implementation, discussion, testing, fixing regressions to squeeze the 
last 0,01% of "customer satisfaction" (not "developer satisfaction") out 
of this function. Do you want to allocate that?


More information about the TYPO3-team-core mailing list