[TYPO3-Solr] Code for Apache Nutch for TYPO3 CMS has been released
thomas macaigne
istreaming at gmail.com
Tue Apr 29 11:44:27 CEST 2014
So I have followed carefully the README included.
Downloaded Nutch 1.8 src, compiled it.
Also tried with the Nutch precompiled binaries.
Modified conf/nutch-site.xml, added API key and baseURL.
The crawl seems to be ok, but it's right at the end that I get this error:
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Getting siteHash for domain: nutch.apache.org
Constructed TYPO3 API URL: http://wiki.mydomain.net/index.php?eID=tx_solr_api&api=siteHash&domain=nutch.apache.org&apiKey=f329b64134351933fbaa916fc05fcbc0d88258d2
TYPO3 Solr API Request sent.
TYPO3 Solr siteHash retrieved: f1c579b9422e0173b0799d5998bd65bc9210f1b2
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
More information about the TYPO3-project-solr
mailing list