[TYPO3-core] RFC #12232: Bug: md5_file() to check if a file has been changed is very expensive [performance]

Ernesto Baschny [cron IT] ernst at cron-it.de
Mon Jan 17 23:44:14 CET 2011


Vladimir Podkovanov schrieb am 17.01.2011 22:17:
> Hi Ernesto!
> 
> On 17.01.2011 17:24, Ernesto Baschny [cron IT] wrote:
>> Please share this "small script", should be easy to create an Upgrade
>> Wizard out of it (its just one tiny function required, right?).
>>
>> We should include that at the same time than adding the md5 change. Else
>> we'll never do it. Since RC1 is already scheduled for Wednesday, it
>> should be done on these next days. Do you have the time to finish that
>> up?
> 
> I've attached script I used, it is written as a scheduler task. I think
> some interface with progress bar needed to integrate it in Install Tool
> because script can take some time, 1 hour in my case with about million
> rows.
> 
> Moreover, now I'm not sure that converting is really needed :)
> I've to check it again but seems that images are not being regenerated
> if hash changed.
> I thought before that changing hash means file change and triggers image
> rebuild with obvious performance issue. But looks like rebuild depends
> only on image sizes (width, height) change!
> 
> So improper hash leads only to deleting that row with old hash and
> checking images sizes again (and writing new hash) and because the sizes
> the same rebuild is not triggered! (sizes are used in setup array which
> being used to generate md5 hash filename for temp image and if array the
> same we got same temp filename and so no rebuild).
> 
> I think checking image sizes should not take much time and we can
> suggest in Upgrade Wizard to just empty cache_imagesizes table and
> really this function already exists in Install Tool in Clean up :).

It's true that image size caching (w / h) has nothing to do with the
decision to "regenerate" certain parts of it.

So your conclusion (to try to sum up is) is that we don't need an
upgrade wizard because cleaning the cache_imagesizes table will happen
during normal operation anyway, is that right? Or the upgrade wizard
could simply TRUNCATE the mentioned cache table?

Now my fear is that this md5 hashing might indeed break some things
because it relies solely on filemtime and filesize. So if we have
1.000.000 files, the probability that we have different files with the
same mtime *and* the same size is pretty big if we consider that we
might have lots of files with the same tstamp (because someone might
have batch-uploaded tons of files at the "same time" or because all
files have the same tstamp because they were restored from some backup
of "whatever"): You end up with crazy behaviour of wrong resizes being
generated.

Maybe adding the "file-path" to the hashing would help?

Cheers,
Ernesto



More information about the TYPO3-team-core mailing list