Apache Nutch 1.x injection crawldb error

Question

Have tried googling the issue but can't find anything useful.

Following tutorial in https://wiki.apache.org/nutch/NutchTutorial

Verified nutch with bin/nutch and it is fine

Have java 8 installed

java -version returns
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

And included in path uxing export

export JAVA_HOME="/cygdrive/c/program files/java/jre8"
export PATH="$JAVA_HOME/bin:$PATH"

Note using windows hence using cygwin64 as well.

Have added directory urls and added file seed.txt with one url

The ran

bin/nutch inject crawl/crawldb urls/seed.txt

and then gets the following error:

Injector: crawlDb: crawl/crawldb Injector: urlDir: urls/seed.txt Injector: Converting injected urls to crawl db entries. Injector: java.io.IOException: lock file crawl/crawldb/.locked already exists.

Ramakrishna Bachu Ramakrishna Bachu · Accepted Answer · 2019-12-15T19:06:04

Hi There are two parts in this problem :

1 . There is already .locked file present in crawldb folder . Just delete the .locked file.

2 . Set the System environment variable Path for both %JAVA_HOME%\bin and %HADOOP_HOME%\bin then also set the User environment variable with %JAVA_HOME% and %HADOOP_HOME% without bin.

Apache Nutch 1.x injection crawldb error

2 Answers