[TYPO3-Solr] Bug or no Bug? was: Pages get indexed with wrong URL

Hauke Meyer meyer at visionconnect.de
Thu Feb 28 12:40:01 CET 2013


Wow, complete story.
Yesterday I wrote my "advice" and today I stumbled in the exact same
kind of trouble o_0

I just analyzed the problem and come to the same conclusion, but until
now without a real solution. My "trick" until now was to set the
"hidePagesIfNotTranslatedByDefault" Parameter in config Vars. Of course
not a real solution for all problems with this task but for me it was
helpful. Your solution (if it works ;-)) looks much practical!

I don't know if this is a bug to file? I think yes because the trouble
starts around the tx_solr_indexqueue_PageIndexer class. For every item
the solrConnection is fetched ($solrConnections =
$this->getSolrConnectionsByItem($item);). If the fallback mechanism
strikes here even a not translated page gets all solrConnections and the
indexing starts. Here I stopped my debugging because I read Heikos
answer. The result was written below.

Hauke


Am 28.02.2013 11:19, schrieb Schwarzenberg:
> Thank you all for your constructive hints.
>
> Before i "present" your the solution, i think i should clarify the
> occured problem:
>
> Currently, i only have one page that is translated to english, all
> other pages only exist in german.
>
> When the current language is german (whether activated by L=0 or by
> preceding "de/" to the URL or neither the one nor the other, because
> german ist configured as default langue):
> If I searched for a german word  that is contained in a page with no
> english translation, I got the URL to the ENGLISH page. (i.e. it
> starts with '/en/..')
>
> If I now searched for a german word that is contained in the page
> which also has translated content, I got the correct search result
> with the URL to the german page (i.e. the URL starts with '/de/..")
> The same worked the other way around, if I'm on the english site and
> searched for a word which exists  on the translated (i.e. english)
> page, i get the correct result with the correct (i.e. english) URL. If
> i search for the german word, i dont get a result, as expected.
>
> So in short, for pages with translated content, everything worked as
> it should but for pages where no english translation exists, somehow
> german pages got indexed with the english URL.
>
> I then checked the configuration and debug messages multiple times:
>
> 1. Reports-Modul -> solr
> Two entries for each Rootpage/Site:
>  - Site: DOMAIN1 (pid: 2, language: default) -
> USER:PASSWORD at localhost:8080/solr-typo3/core_de
>    [..]
>    Path: /solr-typo3/core_de/
>  - Site: DOMAIN1 (pid: 2, language: English) -
> USER:PASSWORD at localhost:8080/solr-typo3/core_en
>    [..]
>    Path: /solr-typo3/core_en/
>
> So, the two languages are connected to the right cores.
>
> 2. I enabled plugin.tx_solr.logging.indexing and
> plugin.tx_solr.logging.indexing.indexQueuePageIndexerGetData,
> reindexed the site and checked the devlog:
> - for each page, I got two entries (one for each language):
>   "Adding 1 documents".    Called from:
> class.tx_solr_typo3pageindexer.php, line 415
>
>   If I looked into the expandable "extra Data"-Tree, the url-Field of
> the first page starts with "en/" and also the content/title-fields
> contains english content. The url-field of the second Page starts with
> "de/" and the content/title-fields contains german contant. So, the
> data solr gets from typo3 for indexing seemed to be correct.
>
> 3. I then looked in the "Solr Admin"-Module in Typo3, choosed the
> german core and looked at the stored items. There, suddenly the
> url-fields of all entries started with "en/". I then switched to the
> english core and there was exactly one single and correct entry -
> namely the one for the tranlated page.
>
> To make a long story short: In the meantime i figured out that the
> content-fallback-mechanism of typo3 somehow was responsible for this
> strange behaviour.
> I had not explicitly configured config.sys_language_mode because in
> the fronted all seemed to work fine: with german language, german
> content was shown, with english activated (whether by L=1 or preceding
> "en/"), the translated content was shown or the german content was
> shown on untranslated pages.
>
> But only when I explicitly set
>
> config.sys_language_mode = content_fallback
>
> all pages get indexed with their correct urls, regardless whether a
> translation exists or not.
>
> Sorry for the long "prosaic essay" here - I hope this helps others who
> encounter the same problem.
>
> Best regards, Heiko
> _______________________________________________
> TYPO3-project-solr mailing list
> TYPO3-project-solr at lists.typo3.org
> http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-project-solr




More information about the TYPO3-project-solr mailing list