[TYPO3-core] RFC: T3-speedup preg insted of ereg

Michael Stucki michael at typo3.org
Sat Nov 5 00:52:05 CET 2005


Hi Bernhard,

I just had a look at your patch. I did not test it and trust that you
accurately tested this.

> Description:
> preg's are faster than ereg's (Don't ask why :) test it).
> strpos is faster than strcspn in cases where you only look for one
> character. I modified t3lib_parsehtml.php in such a way that each
> ereg,split(i) and similar slower functions get replaced by an equally
> working preg statement. The regex were transformed almost without changes.
> Only to the get/substitute subpart methods bigger changes were made as the
> actual eregs didn't work properly for all cases (at leas in my opinion).

Short conclusion of this (and the rest of the thread that has been discussed
so far):

- We all agree on using preg_ instead of ereg_ functions in the future
  because they are much(!) faster.

- We don't do that in one step, instead we just change it whenever we find
  such a part.


> I wrote a test-suite for the new library to ensure that all methods behave
> exactly the same. It tests each method of the old and new class with the
> same input and option parameters and compares the outputs. Normally
> (except for the already mentioned "broken" get/substitute Subpart methods)
> those outputs must match completly.

Obviously your patches changes much more than only the ereg_ => preg_ calls.
To me it looks that you changed the logic of the functions by keeping the
same output. Afaik this is what Robert explained to me as the mystery of
"refactoring", and I think it's a very nice approach.


Here are my comments:

- I don't know if this has to do with refactoring, too, but I would like
very much that every function has only one return statement. This makes the
function much easier to debug!

- Kasper checks for empty strings using strcmp, you make a "===" check
instead. Is there any difference between them? Which of those methods is
faster?

- Sometimes, you write complicated expressions like this one:

if (preg_match('/^([^\<\>]*\-\-\>)?(.*?)(\<\!\-\-[^\<\>]*)?$/s', $content,
$matches)===1)  {

=> I think it would be very helpful if you add a short comment on top of
such hard-to-read lines...


- Somewhere you ask: "What shall get returned if no stop marker is given"
=> I think it should always return the same value like now!


Generally, I think the patch looks very good, even though I did not test it.
>From my side: Go ahead!

> At the end the time consumption get's printed out. Normally the new
> version just take between 8(!!!) and 93% of the time the orignial methods
> did take.

This is great to hear. I like performance boosts, if they are so easy! :-)

> Branches:
> HEAD, 3.7(?), 3.8(?)

I guess this change is much too big for maintenance releases. So it is for
HEAD only, right?

Regards, and thumbs up!
- michael
-- 
Use a newsreader! Check out
http://typo3.org/community/mailing-lists/use-a-news-reader/



More information about the TYPO3-team-core mailing list