I'm experimenting Apache Nutch 1.7 and Solr on Ubuntu 14.04 x64 (AMD) LTS and when i try to run Nutch, it gives me this error message:
Error: JAVA_HOME is not set.
But when i type echo $JAVA_HOME command on terminal, it gives me this path: /usr/lib/jvm/java-7-openjdk-amd64
Below you can see what i've done step by step. How can i fix this?
*ps: Ubuntu is a virtual machine which runs on Mac with Oracle VirtualBox
- Intalling java on terminal with sudo apt-get -y install openjdk-7-jdk
- Checking java installation by java -version command
Setting JAVA_HOME with:
sudo nano /etc/environment
Then typing following line at the bottom of file: JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
kntrl+X shortcut for Saving changes.
Then this command: source /etc/environment
Now JAVA_HOME must be set. I checked it by following command and it gives me the path. echo $JAVA_HOME and output is same as above.
Then i installed Solr by sudo apt-get -y install solr-tomcat
I controlled installation by typing this address in a browser:
http://localhost:8080/solrand it shows me initial page of solrI downloaded Apache Nutch 1.7 from http://nutch.apache.org and file was named as apache-nutch-1.7.-bin.tar.gz
Then extract it: tar -zxvf apache-nutch-1.7-bin.tar.gz
I verfied Nutch's installation by simply this: cd apache-nutch-1.7 then bin/nutch And the output is like Usage: nutch COMMAND where......
Then i edit my conf/nutch-site.xml file as in here: Link (You need to look under this title: "3) Set Up Your Nutch-Site.Xml" ) Things i did different from that last reference are; MyBot and MyBot,* fields. Instead of MyBot i wrote mySpider
Then i get in conf directory of nutch with Terminal. Here's what i did after: mkdir -p urls , cd urls , touch seed.txt , nano seed.txt
i only wrote this url in the file as it's suggested in official tutorial of nutch: http://nutch.apache.org
17After i saved my changed in seed.txt file. I edit the conf/regex-urlfilter.txt file. I delete these two lines:
accept anything else
+.
Then i wrote this instead of them:
+^http://([a-z0-9]*\.)*nutch.apache.org/
After that,
I used this command as it's suggested in tutorial: bin/nutch crawl urls -dir crawl -depth 3 -topN 5
After this command i see this error message: Error: JAVA_HOME is not set.
I also found this article but it didn't solve my problem either: Nutch - Getting Error: JAVA_HOME is not set. when trying to crawl