2
votes

Have tried googling the issue but can't find anything useful.

Following tutorial in https://wiki.apache.org/nutch/NutchTutorial

Verified nutch with bin/nutch and it is fine

Have java 8 installed

java -version returns
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

And included in path uxing export

export JAVA_HOME="/cygdrive/c/program files/java/jre8"
export PATH="$JAVA_HOME/bin:$PATH"

Note using windows hence using cygwin64 as well.

Have added directory urls and added file seed.txt with one url

The ran

bin/nutch inject crawl/crawldb urls/seed.txt

and then gets the following error:

Injector: crawlDb: crawl/crawldb Injector: urlDir: urls/seed.txt Injector: Converting injected urls to crawl db entries. Injector: java.io.IOException: lock file crawl/crawldb/.locked already exists.

2

2 Answers

2
votes

Hi There are two parts in this problem :

1 . There is already .locked file present in crawldb folder . Just delete the .locked file.

2 . Set the System environment variable Path for both %JAVA_HOME%\bin and %HADOOP_HOME%\bin then also set the User environment variable with %JAVA_HOME% and %HADOOP_HOME% without bin.

0
votes

The error message is quite clear: another Nutch job holds a lock on the CrawlDb resp. it crashed or was killed before the lock file has been removed after the job has succeeded. Deleting the lock file crawl/crawldb/.locked should solve the problem. But it's also good practice to look into log files (esp. the hadoop.log) to find out the reason why the lock file hasn't been removed.