[TYPO3-Solr] Code for Apache Nutch for TYPO3 CMS has been released

thomas macaigne istreaming at gmail.com
Tue Apr 29 11:44:27 CEST 2014


So I have followed carefully the README included.
Downloaded Nutch 1.8 src, compiled it.
Also tried with the Nutch precompiled binaries.
Modified conf/nutch-site.xml, added API key and baseURL.
The crawl seems to be ok, but it's right at the end that I get this error:


SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication


Getting siteHash for domain: nutch.apache.org
Constructed TYPO3 API URL: http://wiki.mydomain.net/index.php?eID=tx_solr_api&api=siteHash&domain=nutch.apache.org&apiKey=f329b64134351933fbaa916fc05fcbc0d88258d2
TYPO3 Solr API Request sent.
TYPO3 Solr siteHash retrieved: f1c579b9422e0173b0799d5998bd65bc9210f1b2
Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)



More information about the TYPO3-project-solr mailing list