[TYPO3-dev] An idea to further process ' page not found ' 404 handling

SwagmanInternet typo3 at swagmaninternet.com
Tue Apr 29 06:24:03 CEST 2008


Hi all,

Background.

On and off over the last 2-3 weeks I have been working to get my TYPO3 
installs to properly process ' page not found ' 404 handling. I have read 
the information on this page 
http://typo3.org/development/articles/improved-404-handling/ and note that 
requested pages and resources that do not exist are redirected to the 
websites home page and do not respond with correct 404 http headers.

So I have come up with the below comments and suggestions from my research 
and testing. Just in case I missed something in setting up 404 handling in 
my T3 site(s) working I tested out the example incorrect URLs listed below 
on a few random TYPO3 websites online and saw similar redirections to the 
index page of those TYPO3 installs too.

If the below code suggestions are plausable perhaps they could be considered 
for the core?

Your thoughts and comments appreciated.

Regards,

Matt
SwagmanInternet.co

--------------------------------------------------------------------

404 handling report & code suggestions in detail:


After following the set up guide mentioned here 
http://typo3.org/development/articles/improved-404-handling/  and when using 
default TYPO3 .htaccess file and with or without 'simulate static documents' 
it appears that 404 pagenotfound_handling works only for requested files:

            - that have .html as the file suffix
            - and the file being called is requested from the root of a 
typo3 site, i.e. www.domain.com.au/file.html

If 'file.html' exists, page is shown

If 'wrongfile.html' does not exist then correct 404 handling takes place 
correctly, due to function '$this->checkAndSetAlias()'

Correct 404 page error handling appears to fail for the following requested 
resources;

                        - www.domain.com.au/file.htm
                        - www.domain.com.au/file.pdf

                        - www.domain.com.au/folder/
                        - www.domain.com.au/folder/file.html
                        - www.domain.com.au/folder/file.pdf

                        - 
www.domain.com.au/file.htm?&tx_yourextension_pi1[showUid]=171
                        - 
www.domain.com.au/folder/file.html?&tx_yourextension_pi1[showUid]=171

If any of the above requested resources fail then the browser is redirected 
to the home page of website, the root page, due to $this->id being 'false' 
and then $this->id is set to '0' in function 'setIDfromArgV()'

--------------------------------------------------------------------

Suggested short-term workaround that appears to work before potential patch 
to 'class.tslib_fe.php'

in .htaccess, modify to suit:

AFTER lines

            RewriteCond %{REQUEST_FILENAME} !-f
            RewriteCond %{REQUEST_FILENAME} !-d
            RewriteCond %{REQUEST_FILENAME} !-l

COMMENT OUT

            # RewriteRule .* index.php [L]

ADD/CHANGE TO

            # If any file/dir/symlink does not exist, set id=000 , this 
value not likely to exist in the databases page table
            RewriteRule .* index.php?id=000 [L]

------------------------

What happens in 'class.tslib_fe.php' , (after above .htaccess hack applied), 
when TYPO3 processes the page requested when you force an 'id=000' in the 
.htaccess file

            - firstly, the argument '000' is set to type 'string'
            - now when $this->id is processed by function 
checkAndSetAlias(), $this->pageNotFound is set to 4
              the if condition inside function checkAndSetAlias(), "  if 
($this->id && !t3lib_div::testInt($this->id))  ", is false when id=000

            - now that the var $this->pageNotFound has a value the following
              function is called 
$this->pageNotFoundAndExit($pNotFoundMsg[$this->pageNotFound])
              which now now executes the function 
$this->pageNotFoundHandler() and causes script to exit

            - Noting: if the following file is requested, 
'www.domain.com.au/file.html',
            - $this->id is changed from 000 to the alias value in variable 
$fI['file'] in function checkAlternativeIdMethods()
              therefore alias formed by simulate static documents are still 
checked if exists in databse


Testing scenarios for the short-term workaround.

            - all the above requested resources URLs, (inc working & failed 
URLs), applied in browser to test using short term fix to .htaccess file

            - tested using typo3 src v4.1.5 and no simulate static 
documents, i.e. www.domain.com.au/index.php?id=000
            - tested using typo3 src v4.1.5 and using simulate static 
documents, config.simulateStaticDocuments = 1

            - have not tested with realurl extension installed

            - this workaround works on typo3 src v4.1.5, potentially will 
work on previous versions of TYPO3 source, though this will depend on what 
changes, if any, exist between versions of  'class.tslib_fe.php'

-----------------------------------------------------------

Long term suggested solution for fixing 404 pagenotfound_handling:

After reviewing code mainly in file 'class.tslib_fe.php',
I suggest the following 2 code fixes/modifications in file 
'class.tslib_fe.php'.

------------------------

NEW function call added inside function fetch_the_id(), and also a 5th key=> 
value added to array $pNotFoundMsg


                                    // Checks if $this->id is false && 
pageNotFound_handling enabled, if yes, then set $this->pageNotFound = 5
                        $this->checkAndSetPageNotFound();

                        if ($this->pageNotFound && 
$this->TYPO3_CONF_VARS['FE']['pageNotFound_handling'])   {
                                    $pNotFoundMsg = array(
                                                1 => 'ID was not an 
accessible page',
                                                2 => 'Subsection was found 
and not accessible',
                                                3 => 'ID was outside the 
domain',
                                                4 => 'The requested page 
alias does not exist',
                                                5 => 'The requested page or 
file resource does not exist'
                                    );

-----------------------------------------------------------

ALSO NEW function, inserted possibly below function 
ADMCMD_preview_postInit($previewConfig)


            /**
             * Checks if $this->id is false && pageNotFound_handling 
enabled, if yes, then set $this->pageNotFound = 5
             * When $this->pageNotFound set 5 the TYPO3 correctly redirects 
pageNotFound requests to value set in config 
$TYPO3_CONF_VARS['FE']['pageNotFound_handling']
             * this should only run when a file/symlink/directory does not 
exist and page was Redirected to index.php in .htaccess file
             *
             * @return         void
             * @access private                     // should this be set 
private?
             * @see fetch_the_id()
             */
            function checkAndSetPageNotFound()     {
                        if (!$this->id && 
$this->TYPO3_CONF_VARS['FE']['pageNotFound_handling'] && 
!$this->pageNotFound == 4) {
                                                $this->pageNotFound = 5;
                                    }
            }

 -----------------------------------------------------------
-----------------------------------------------------------

Note:

Don't forget to set html tag '<base href=...>', in your main ts templates 
config; This should be set so that when/if a page is called with a folder in 
its path and also page does not exist the browser is redirected to your 
TYPO3 index page. If your file resources do not have html tag '<base 
href=...>' or absolute paths set then this could cause the page resources, 
i.e. css files & images, relative paths to break and cause additional 
browser redirects to your TYPO3 sites index page. Overall potentially 
causing an infinite loop. Now when you set 'config.baseURL' you should also 
set 'prefixLocalAnchors', this stops href anchors '#' from looking like 
www.domain.com.au/# and instead the anchor is prefixed with the 'page name' 
of the current requested page. The below ts code also demonstrates how to 
set 'config.baseURL' to work with https pages, i.e. when using extension 
'https_enforcer'.

Code to add to main ts template here, (inserted above 'page = PAGE' 
declaration).
------------------

# turn on simulate static documents
config.simulateStaticDocuments = 1

## Set <base href=...> , considers if website uses SSL/https pages
## remove single comments here when if website uses SSL/https pages
#[globalVar = TSFE:page|tx_httpsenforcer_force_secure = 1]
#config.baseURL = https://www.domain.com.au/
#[else]
config.baseURL = http://www.domain.com.au/
#[global]
config.prefixLocalAnchors = all

page = PAGE



-----------------------------------------------------------






More information about the TYPO3-dev mailing list