[TYPO3-dev] [TYPO3-core] RFC: #10201: Duplicate cHash Values
Dan Osipov
dosipov at phillyburbs.com
Wed Apr 29 18:50:29 CEST 2009
Since TYPO3 claims to be an enterprise level CMS, something like this
really puts a dent in the reputation. Yes, small sites will be OK with a
short cHash, but any significantly large site will encounter problems.
Dan Osipov
Calkins Media
http://danosipov.com/blog/
Ries van Twisk wrote:
>
> On Apr 28, 2009, at 4:42 PM, Bernhard Kraft wrote:
>
>> Steffen Kamper wrote:
>>
>>> with realurl you don't see the cHash in url. md5 is unique, shortmd5
>>> not. So it's no real decision, it's a must imho.
>>
>> Hallo Steffen.
>>
>> I have to correct you. Mathematically neither md5 nor the "short-md5" in
>> TYPO3 can be unique. If you have 16 bytes (32 hex-chars) then there are
>> 256^16 == 16^32 possibilities for the md5-value (which is about 10^38)
>>
>> If you use only the first 10 characters (5 bytes) of the md5-sum then
>> there are only 256^5 =~ 10^12 possibilities. As every byte increases
>> the number of possible variations by 256 times, its of course a drastic
>> difference.
>>
>> Lets think of all variations of a file with lets say 50 byte. It's clear
>> then, if you reduce the "amount of data" in this file to 16 bytes by
>> calculating the md5 sum (kind digit sum) that some variations must result
>> in the same value.
>>
>> So altough very unprobable its still possible that even with the full md5
>> value, the mentioned "md5 not unique" error occurs.
>>
>>
>> AFAIR the "duplicate cHash" sql error message, when putting a page into
>> cache, wasn't caused by two content pages resulting in the same cHash.
>> I do not exactly remember what was the cause (or if I even did some
>> research when I stumbled upon the problem) ... of course it could be
>> different in your case with those many pages.
>>
>>
>
>
> basically what Berhard is saying is that if you just add 8 characters to
> the short MD5,
> the Probability will be 256x256*256*256=4294967296 lower to hit
> the same value twice, if you make it 4 chars longer it's 65536 times lower.
>
> may be a install tool setting might be something? You don't hear to
> often anyways
> that there are people with chash clashes.
>
> Ries
>
>
>
>
More information about the TYPO3-dev
mailing list