[TYPO3-core] RFC #13732: External URL only indexes first page [indexed_search]

Xavier Perseguers typo3 at perseguers.ch
Tue Mar 23 08:59:23 CET 2010


Hi,

REMINDER #1



On 08.03.2010 15:42, Xavier Perseguers wrote:
> Hi,
>
> This is a SVN patch request.
>
> Type: Bugfix
>
> Branches: trunk, 4-3
>
> Bugtracker reference:
> http://bugs.typo3.org/view.php?id=13732
>
> Problem:
> When indexing an external URL/website, the first page is indexed but no
> subpage of the external website.
>
> Problem is related to relative links vs absolute (w/ scheme) in
> hyperlinks. Today's websites often use relative links:
>
> <a href="some/relative/page.html">....
>
> instead of
>
> <a href="http://www.domain.tld/subsite/some/relative/page.html">
>
> Problem is that EXT:indexed_search/class.crawler.php in method
> indexExtUrl() is not able to properly convert from relative link to
> absolute when dealing with external websites. In such cases, the URL
> above will be converted to
>
> http://www.domain.tld/some/relative/page.html
>
> Please note the missing "/subsite/" part in the computed full url.
>
> Solution:
> According to [1], conversion from relative url to full url should first
> try to use a "base href" tag if present and then rely on implicit
> relative url with enclosing path.
>
> The patch tries to extract the base href, if present, and otherwise use
> the same mechanism as before patch but do not forget to append the path
> after the domain name (and make sure to remove any ".html" page that may
> be given as base URL in the indexing configuration to only return the
> enclosing path or "parent directory").
>
> Note:
> The revision number in my patch is against latest version even if it
> seems to be against an old revision ;)
>
> Cheers
>
> [1] http://www.w3.org/TR/html401/struct/links.html#h-12.4
>


-- 
Xavier Perseguers
http://xavier.perseguers.ch/en


More information about the TYPO3-team-core mailing list