0
votes

I have been following the instructions at http://wiki.apache.org/nutch/Nutch2Tutorial to see if I can get a nutch installation running with ElasticSearch. I have successfully done a crawl with no real issues, but then when I try and load the results into elasticsearch I run into trouble.

I issue the command:

bin/nutch elasticindex <$cluser> -all

And it waits around for a long time and then comes back with an error: Exception in thread "main" java.lang.RuntimeException: job failed: name=elastic-index [ocpnutch], jobid=job_local_0001

If I look in the logs at:

~/apache-nutch-2.1/runtime/local/logs/hadoop.log

I see several errors like this:

Exception caught on netty layer [[id: 0x569764bd, /192.168.17.39:52554 => /192.168.17.60:9300]] java.lang.OutOfMemoryError: Java heap space

There is nothing in the logs on the elastic search.

I have tried changing: elastic.max.bulk.docs and elastic.max.bulk.size to small sizes and allocating large amounts of GB to nutch, but to no avail.

The jvm is: Java(TM) SE Runtime Environment (build 1.7.0_21-b11)

Does anyone have any idea what I am doing wrong - what other diagnostic information would be helpful to solve this problem?

2
If that's your nutch log, the outofmemory is on nutch. That error is thrown by the elasticsearch client while trying to communicate with the elasticsearch running instance. You really sure you gave enough memory to nutch?javanna
This is a good question - and were I have been concentrating my efforts. I have edited the line in the bin/nutch file that says "JAVA_HEAP_MAX=-Xmx2G" trying everything from 1G to 12G and I can see the process using the memory - but to be honest I am not sure how much should be needed, however the stack trace doesn't seem to be that helpful.nwaltham
I should also add that it works fine with solr: bin/nutch solrindex 127.0.0.1:8983/solr -reindexnwaltham
Might be a problem of elasticsearch versions then. Using tha java API you use the binary protocol used to communicate between nodes. Might be that the version integrated with nutch and the one on your cluster don't match?javanna

2 Answers

1
votes

I have exactly the same problem. I work with elasticsearch 0.90.2. I found a solution : with elasticsearch 0.19.4 it works !

1
votes

I had a similar problem caused by incompatible versions of HBase and elastic search. Using Hbase Version 0.90.4 and Elastic Search Version 0.90.9 worked for me.

I have done some changes in Configuration. In ~/apache-nutch-2.2.1/ivy/ivy.xml the revision of the dependency for elasticsearch must be set to 0.90.9

In the file ElasticWriter.java in line 104 the statement:

if (item.failed())

had to be changed to:

if (item.isFailed())

Then it worked for me.