[Typo3-dev] thougts of TYPO3 imagefunction and short MD5 Hash

Daniel Pötzinger operation-lan at gmx.de
Sun Jul 4 14:25:31 CEST 2004


Kasper Skårhøj schrieb:

> I didn't follow you all the way, but nevermind. I think your point is
> right. For instance, lets say we have a hash string between 1 and 1000.
> If we make two strings from various sources there is a 1/1000 chance
> that those two strings would be the same. But if you make 100 strings
> there is a much higher probability that we have collisions.
In this way. My english should be better but I try to explain:

What I mean is this part in imageconvert process:
$theOutputName = 
t3lib_div::shortMD5($command.$imagefile.filemtime($imagefile).$frame);

Means there is build an hash of an string. Dont care what the string is.
My consideration is about the nature of this hash. I say that a string 
refers to an hashstring. It is also possible that 2 diffrent strings did 
refer to the same hashstring, right?
That this is not impossible shows the reality. (I saw TYPO3 sites with 
many images where wrong images gets shown. This is because two strings 
have the same hashstring, so that instead of the correct image the other 
is displayed. I hope explain it understandable.)

I also tried to explain the mathematical things behind this:
- the "Hashstable" has about  3 quadrillion diffrent places
- the probality of an collison is greater than 50% if we want to hash 52 
Million images (hashplaces be taken). (conditional probability)
- the probalility is higher because the "hashfunction" is not ideal.

The solution may be to detect collisons, but therefore a new table is 
needed, perhaps it is possible to solve it with the cache_imagesizes 
table? (hashfunction with overflow table for example)

Greetings


> 
> So of course the probability must be balanced with the number of hashes
> you plan to do since the interesting thing is what the probability is of
> two hashes out of 50000 matching.

> 
> - kasper
> 




More information about the TYPO3-dev mailing list