[TYPO3-Solr] Pages get indexed with wrong URL

Schwarzenberg schwarzenberg at uni-leipzig.de
Thu Feb 28 11:19:48 CET 2013


Thank you all for your constructive hints.

Before i "present" your the solution, i think i should clarify the occured problem:

Currently, i only have one page that is translated to english, all other pages only exist in german.

When the current language is german (whether activated by L=0 or by preceding "de/" to the URL or 
neither the one nor the other, because german ist configured as default langue):
If I searched for a german word  that is contained in a page with no english translation, I got the 
URL to the ENGLISH page. (i.e. it starts with '/en/..')

If I now searched for a german word that is contained in the page which also has translated content, 
I got the correct search result with the URL to the german page (i.e. the URL starts with '/de/..") 
The same worked the other way around, if I'm on the english site and searched for a word which 
exists  on the translated (i.e. english) page, i get the correct result with the correct (i.e. 
english) URL. If i search for the german word, i dont get a result, as expected.

So in short, for pages with translated content, everything worked as it should but for pages where 
no english translation exists, somehow german pages got indexed with the english URL.

I then checked the configuration and debug messages multiple times:

1. Reports-Modul -> solr
Two entries for each Rootpage/Site:
  - Site: DOMAIN1 (pid: 2, language: default) - USER:PASSWORD at localhost:8080/solr-typo3/core_de
    [..]
    Path: /solr-typo3/core_de/
  - Site: DOMAIN1 (pid: 2, language: English) - USER:PASSWORD at localhost:8080/solr-typo3/core_en
    [..]
    Path: /solr-typo3/core_en/

So, the two languages are connected to the right cores.

2. I enabled plugin.tx_solr.logging.indexing and 
plugin.tx_solr.logging.indexing.indexQueuePageIndexerGetData, reindexed the site and checked the devlog:
- for each page, I got two entries (one for each language):
   "Adding 1 documents".	Called from: class.tx_solr_typo3pageindexer.php, line 415

   If I looked into the expandable "extra Data"-Tree, the url-Field of the first page starts with 
"en/" and also the content/title-fields contains english content. The url-field of the second Page 
starts with "de/" and the content/title-fields contains german contant. So, the data solr gets from 
typo3 for indexing seemed to be correct.

3. I then looked in the "Solr Admin"-Module in Typo3, choosed the german core and looked at the 
stored items. There, suddenly the url-fields of all entries started with "en/". I then switched to 
the english core and there was exactly one single and correct entry - namely the one for the 
tranlated page.

To make a long story short: In the meantime i figured out that the content-fallback-mechanism of 
typo3 somehow was responsible for this strange behaviour.
I had not explicitly configured config.sys_language_mode because in the fronted all seemed to work 
fine: with german language, german content was shown, with english activated (whether by L=1 or 
preceding "en/"), the translated content was shown or the german content was shown on untranslated 
pages.

But only when I explicitly set

config.sys_language_mode = content_fallback

all pages get indexed with their correct urls, regardless whether a translation exists or not.

Sorry for the long "prosaic essay" here - I hope this helps others who encounter the same problem.

Best regards, Heiko


More information about the TYPO3-project-solr mailing list