[TYPO3-core] RFC: Bug #13972: cropHTML uses faulty reg exp for HTML entities
Jigal van Hemert
jigal at xs4all.nl
Thu Apr 15 22:23:39 CEST 2010
Jochen Rau wrote:
> On 15.04.10 20:30, Jigal van Hemert wrote:
>> - valid entities can be longer than 7 characters (e.g. ϑ) [1]
> As I implemented the first version of cropHTML, I read the spec linked
> above, too. I made a list of entitites but must have overseen the only
> entity name in the list having a length > 7 ;-)
Seems I had a bright moment spotting that one in the list :-D
>> - not everything between & and ; is a valid entity
> That's true. But the alternative is to build a list of entity names (at
> least the ones specified in [1]) and make a preg_match only with these
> ones. But someone can add some new entities which is allowed to in every
> SGML compliant language. What's next?
(...)
> BTW If you are interested in the original thread, take some coffee and
> crawl for "[TYPO3-core] RFC #7984: Bug: stdWrap.crop now closes opened
> tags and counts chars correctly" (starting at 2008-04-03). It's funny to
> read but it will take it's time ;-)
I poured myself a new diet Coke and briefly scanned the thread.
First there was resistance about using html_entity_decode() and later it
was used to determine the length of string with entities.
I couldn't find in the thread if it was so much slower to just traverse
the html_entity_decoded string along with the original version and skip
past the next ';' if the characters in both are different on the
position of the pointer?
The code already relies on that function to determine the length. This
way you'll end up with a cropped version of the original string and all
entities supported by this function are recognized.
normal chars àvry ë end.
.............x ....x .....
normal chars à>>>>>>>vry ë>>>>> end.
--
Jigal van Hemert
skype:jigal.van.hemert
msn: jigal at xs4all.nl
http://twitter.com/jigalvh
More information about the TYPO3-team-core
mailing list