[TYPO3-Solr] Multi-core nutch
Lienhart Woitok
Lienhart.Woitok at netlogix.de
Wed Jun 4 10:53:38 CEST 2014
Hi Jigal,
I use a script that loops over all langauges I want to index, replaces some
config files and then runs bin/crawl with the correct solr core for a language.
This requires that I am able to distinguish the language of a url with a regex,
but in my case this is fine (it is an old TYPO3 installation with L= parameter).
The script looks somewhat like this:
for LANGUAGE in $LANGUAGES ; do
SOLR_URL="${SOLR_BASE_URL}-${LANGUAGE}"
SEEDDIR="urls-${LANGUAGE}"
CRAWL_PATH="crawl-${LANGUAGE}"
REGEX_URLFILTER="conf/regex-urlfilter.txt"
REGEX_URLFILTER_LANGUAGE="${REGEX_URLFILTER}.${LANGUAGE}"
cp "${REGEX_URLFILTER_LANGUAGE}" "${REGEX_URLFILTER}"
bin/crawl "${SEEDDIR}" "${CRAWL_PATH}" "${SOLR_URL}" "${LIMIT}"
Done
As far as I understood nutch source code it is not possible to index into different
solr cores for different languages within a single nutch run.
I hope this helps.
Regards,
Lienhart Woitok
Web-Entwickler
Telefon: +49 (911) 539909 - 0
E-Mail: Lienhart.Woitok at netlogix.de
Website: media.netlogix.de
-----------------------------
Citrix XenApp & Desktop 7.5 – Das Wichtigste in einem Tag
Lernen Sie die neue Version kennen. Jetzt anmelden zum netlogix 79er Seminar am 26.06.2014 für nur 79.- EUR:
Jetzt anmelden: http://it-training.netlogix.de/angebote/79ers/citrix-xendesktop-75
------------------------------------
--
netlogix GmbH & Co. KG
IT-Services | IT-Training | Media
Neuwieder Straße 10 | 90411 Nürnberg
Telefon: +49 (911) 539909 - 0 | Fax: +49 (911) 539909 - 99
E-Mail: info at netlogix.de | Internet: http://www.netlogix.de
netlogix GmbH & Co. KG ist eingetragen am Amtsgericht Nürnberg (HRA 13338)
Persönlich haftende Gesellschafterin: netlogix Verwaltungs GmbH (HRB 20634)
Umsatzsteuer-Identifikationsnummer: DE 233472254
Geschäftsführer: Stefan Buchta, Matthias Schmidt
-----Ursprüngliche Nachricht-----
Von: typo3-project-solr-bounces at lists.typo3.org [mailto:typo3-project-solr-bounces at lists.typo3.org] Im Auftrag von Jigal van Hemert
Gesendet: Mittwoch, 4. Juni 2014 09:09
An: typo3-project-solr at lists.typo3.org
Betreff: [TYPO3-Solr] Multi-core nutch
Hi,
The pre-compiled apache-nutch-for-typo3 works great! Currently I have no
idea how to use it with multiple cores (and multiple sites) on the same
solr/nutch server installation.
Is there a way to use multiple configurations on a single Nutch
installation (at least use different nutch-site.xml and regex-urlfilter.tx)?
--
Jigal van Hemert
TYPO3 CMS Active Contributor
TYPO3 .... inspiring people to share!
Get involved: typo3.org
_______________________________________________
TYPO3-project-solr mailing list
TYPO3-project-solr at lists.typo3.org
http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-project-solr
More information about the TYPO3-project-solr
mailing list