[TYPO3-v4] getUrl issues - linkvalidator

Ernesto Baschny [cron IT] ernst at cron-it.de
Tue Mar 15 09:27:18 CET 2011


Jigal van Hemert schrieb am 15.03.2011 09:02:
> Hi,
> 
> On 15-3-2011 0:43, Philipp Gampe wrote:
>> During development of linkvalidator some issues popped up with getUrl.
> 
> There is a lot wrong with t3lib_div::getURL()
> 
> I once tried to add POST capabilities to it, but that RFC was voted away
> because getURL was so complex and poorly constructed due to continuous
> extensions in the past. It was close to feature freeze and I couldn't
> make a proper HTTP library (or adapt an existing one) in the amount of
> time that was left.
> 
> Maybe this is a good opportunity to make one?
> 
>> Unfortunately we can not use HEAD, because some servers like amazon.com
>> behave differently on HEAD compared to GET. This is a poor
>> implementation,
>> but RFCs only say that HEAD _SHOULD_ behave as GET.
> 
> In RFCs there is a (sometimes subtle) difference between 'must',
> 'should' and 'can'. So if the RFC says that HEAD 'should' behave as GET,
> the amazon server is still correct by not behaving this way. It would be
> different if the RFC had said that it 'must' behave as GET.
> 
>> The question is now what to do in 4.5 branch and how to solve the big
>> file
>> problem.
> 
> It could be a good solution to make a proper HTTP library (which can
> implement workarounds for poor HEAD behaviour), include this in 4.5 also
> and let getURL use this library in a backwards compatible way.

+1 on that!

But maybe we should just use "curl" and make it a required PHP module
for TYPO3 4.6? In my opinion curl has (almost) "all we need" (and more)
instead of writing yet another solution in straight PHP. Curl handles
proxies, HTTPs, GET, POST, Authentication, Cookies, Redirects, content
decoding, ...  And its shipped with all major distributions.

What we just need is then a t3lib wrapper for the more complex stuff
cURL offers which we currently cannot influence in t3lib_div::getUrl
(like disabling SSL verification, doing POST, ...).

The only drawback to that is that we cannot request only the headers
with a "GET" statement, which is what the link validation needs (we've
experienced the same trouble already). Doing a HEAD is not enough for
checking if an URL really works, unfortunately. Maybe it could be done
with curl somehow?

Cheers,
Ernesto


More information about the TYPO3-project-v4 mailing list