4
votes

I'm trying to deploy nutch 2.1 on Ubuntu 12.04 by following that tutorial. Everything goes well until I try to inject urls into the database. When I type ($bin/nutch inject urls) and press Enter I get

    InjectorJob: starting
    InjectorJob: urlDir: urls

and remains there (for hours) until I decide to cancel the execution. urls is a directory that contains file with urls. I added proxy and port details in the nutch-site.xml as suggested here but it doesn't solve. I tried apache nutch 2.2.1 and the issue continues.

If you know how to fix that issue, please, help me!

Thanks in advance.

1

1 Answers

2
votes

Ubuntu defaults the loopback IP address in hosts to 127.0.1.1. HBase (according to this page) requires your loopback IP address be 127.0.0.1.

The Ubuntu /etc/hosts file by default contains (with myComputerName being your computer name):

127.0.0.1   localhost
127.0.1.1   myComputerName

Use sudo gedit /etc/hosts to update your hosts file as follow:

127.0.0.1   localhost
127.0.0.1   myComputerName

Reboot Ubuntu. Nutch should no longer have trouble injecting urls into HBase.