[Typo3-dev] thougts of TYPO3 imagefunction and short MD5 Hash
Daniel Pötzinger
operation-lan at gmx.de
Sun Jul 4 14:25:31 CEST 2004
Kasper Skårhøj schrieb:
> I didn't follow you all the way, but nevermind. I think your point is
> right. For instance, lets say we have a hash string between 1 and 1000.
> If we make two strings from various sources there is a 1/1000 chance
> that those two strings would be the same. But if you make 100 strings
> there is a much higher probability that we have collisions.
In this way. My english should be better but I try to explain:
What I mean is this part in imageconvert process:
$theOutputName =
t3lib_div::shortMD5($command.$imagefile.filemtime($imagefile).$frame);
Means there is build an hash of an string. Dont care what the string is.
My consideration is about the nature of this hash. I say that a string
refers to an hashstring. It is also possible that 2 diffrent strings did
refer to the same hashstring, right?
That this is not impossible shows the reality. (I saw TYPO3 sites with
many images where wrong images gets shown. This is because two strings
have the same hashstring, so that instead of the correct image the other
is displayed. I hope explain it understandable.)
I also tried to explain the mathematical things behind this:
- the "Hashstable" has about 3 quadrillion diffrent places
- the probality of an collison is greater than 50% if we want to hash 52
Million images (hashplaces be taken). (conditional probability)
- the probalility is higher because the "hashfunction" is not ideal.
The solution may be to detect collisons, but therefore a new table is
needed, perhaps it is possible to solve it with the cache_imagesizes
table? (hashfunction with overflow table for example)
Greetings
>
> So of course the probability must be balanced with the number of hashes
> you plan to do since the interesting thing is what the probability is of
> two hashes out of 50000 matching.
>
> - kasper
>
More information about the TYPO3-dev
mailing list