[TYPO3-Solr] Multi-core nutch

Lienhart Woitok Lienhart.Woitok at netlogix.de
Wed Jun 4 10:53:38 CEST 2014


Hi Jigal,

I use a script that loops over all langauges I want to index, replaces some
config files and then runs bin/crawl with the correct solr core for a language.

This requires that I am able to distinguish the language of a url with a regex,
but in my case this is fine (it is an old TYPO3 installation with L= parameter).

The script looks somewhat like this:

for LANGUAGE in $LANGUAGES ; do
        SOLR_URL="${SOLR_BASE_URL}-${LANGUAGE}"
        SEEDDIR="urls-${LANGUAGE}"
        CRAWL_PATH="crawl-${LANGUAGE}"
        REGEX_URLFILTER="conf/regex-urlfilter.txt"
        REGEX_URLFILTER_LANGUAGE="${REGEX_URLFILTER}.${LANGUAGE}"
        cp "${REGEX_URLFILTER_LANGUAGE}" "${REGEX_URLFILTER}"
        bin/crawl "${SEEDDIR}" "${CRAWL_PATH}" "${SOLR_URL}" "${LIMIT}"
Done

As far as I understood nutch source code it is not possible to index into different
solr cores for different languages within a single nutch run.

I hope this helps.

Regards,


Lienhart Woitok
Web-Entwickler

Telefon: +49 (911) 539909 - 0
E-Mail: Lienhart.Woitok at netlogix.de
Website: media.netlogix.de



-----------------------------
Citrix XenApp & Desktop 7.5 – Das Wichtigste in einem Tag
Lernen Sie die neue Version kennen. Jetzt anmelden zum netlogix 79er Seminar am 26.06.2014 für nur 79.- EUR:
Jetzt anmelden: http://it-training.netlogix.de/angebote/79ers/citrix-xendesktop-75
------------------------------------



--
netlogix GmbH & Co. KG
IT-Services | IT-Training | Media
Neuwieder Straße 10 | 90411 Nürnberg
Telefon: +49 (911) 539909 - 0 | Fax: +49 (911) 539909 - 99
E-Mail: info at netlogix.de | Internet: http://www.netlogix.de

netlogix GmbH & Co. KG ist eingetragen am Amtsgericht Nürnberg (HRA 13338)
Persönlich haftende Gesellschafterin: netlogix Verwaltungs GmbH (HRB 20634)
Umsatzsteuer-Identifikationsnummer: DE 233472254
Geschäftsführer: Stefan Buchta, Matthias Schmidt



-----Ursprüngliche Nachricht-----
Von: typo3-project-solr-bounces at lists.typo3.org [mailto:typo3-project-solr-bounces at lists.typo3.org] Im Auftrag von Jigal van Hemert
Gesendet: Mittwoch, 4. Juni 2014 09:09
An: typo3-project-solr at lists.typo3.org
Betreff: [TYPO3-Solr] Multi-core nutch

Hi,

The pre-compiled apache-nutch-for-typo3 works great! Currently I have no
idea how to use it with multiple cores (and multiple sites) on the same
solr/nutch server installation.

Is there a way to use multiple configurations on a single Nutch
installation (at least use different nutch-site.xml and regex-urlfilter.tx)?

--
Jigal van Hemert
TYPO3 CMS Active Contributor

TYPO3 .... inspiring people to share!
Get involved: typo3.org
_______________________________________________
TYPO3-project-solr mailing list
TYPO3-project-solr at lists.typo3.org
http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-project-solr


More information about the TYPO3-project-solr mailing list