[TYPO3-core] RFC #12232: Bug: md5_file() to check if a file has been changed is very expensive [performance]

Vladimir Podkovanov admin at sitesfactory.ru
Tue Jan 18 00:15:17 CET 2011


On 18.01.2011 1:44, Ernesto Baschny [cron IT] wrote:
> It's true that image size caching (w / h) has nothing to do with the
> decision to "regenerate" certain parts of it.
>
> So your conclusion (to try to sum up is) is that we don't need an
> upgrade wizard because cleaning the cache_imagesizes table will happen
> during normal operation anyway, is that right? Or the upgrade wizard
> could simply TRUNCATE the mentioned cache table?

We don't need Upgrade Wizard because cache_imagesizes will rebuild 
itself and this rebuild doesn't affect performance too much as I thought 
before.
Cleaning cache_imagesizes could be suggested just to speed up rebuild 
(no time spent on checking old hashes just create them from scratch) but 
it is not prerequisite so we can skip cleaning.

>
> Now my fear is that this md5 hashing might indeed break some things
> because it relies solely on filemtime and filesize. So if we have
> 1.000.000 files, the probability that we have different files with the
> same mtime *and* the same size is pretty big if we consider that we
> might have lots of files with the same tstamp (because someone might
> have batch-uploaded tons of files at the "same time" or because all
> files have the same tstamp because they were restored from some backup
> of "whatever"): You end up with crazy behaviour of wrong resizes being
> generated.
>
> Maybe adding the "file-path" to the hashing would help?
>

It is not a problem as md5hash used only to check if file changed and 
not as index. The table indexed by md5filename that is hash from filepath.

If you worry about deleting row (now it is doing by md5hash and not 
md5filehash) then it is another bug (RFC #16685), I sent patch yet.

-- 
-rgds-
Vladimir


More information about the TYPO3-team-core mailing list