[TYPO3-core] RFC #13858: IS cannot not index files if absRefPrefix is set and indexExternalURLs is not

Dmitry Dulepov dmitry.dulepov at gmail.com
Thu Mar 18 14:53:39 CET 2010


Hi!

This is SVN patch request.

Type: major bug

Branches: 4.3, 4.4

Problem:
------------
If config.absRefPrefix is set, indexed search will not index files.
There are several problems in the code:
- all non–relative URLs are treated as external. Thus if
indexExternalURLs is not set in the extension (!!!) properties, no files
will be indexed at all
- if config.absRefPrefix is a slash, indexed search will not find any
files to index. Files will be indexed only if config.absRefPrefix looks
like a baseURL

Solution:
------------
The attached patch solves the issue in the following way:
- it adds several ways to detect of the file is local. Currently files
are always fetched with HTTP, which is a waste of resources. There are
several methods for this (absRefPrefix, absolute URL without host,
current host, etc)
- it adds an additional check for local files in case of schema is present
- it avoids anchors when they present in the href at position 0

After applying the patch, indexing of files works like a charm and it
does not cause any unnecessary HTTP requests.

Notes:
------------
- if you want to test it, I suggest using TXT files. I used PDF and ran
into problem on my Mac (I had pdftotext but not pdfinfo and spent near
an hour trying to understand why pdfs are not indexed)
- it is best to use crawler. Add a link to the file on one page and
crawl it using BE. You will get entries in the queue. One should be for
the page, another for the file. Without this patch it is one entry, with
the patch it is both.

As it is not easy to test, feel free to ask questions, I will try to
answer them as soon as I can.

-- 
Dmitry Dulepov
TYPO3 expert / TYPO3 security team member Read more @
http://dmitry-dulepov.com/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 13858.diff
URL: <http://lists.typo3.org/pipermail/typo3-team-core/attachments/20100318/c3ba3d10/attachment.asc>


More information about the TYPO3-team-core mailing list