I’m trying to setup a local mongodb crawler for my Watson discovery service. MongoDB is up and running. I downloaded the JDBC connector (mongodb-driver-3.4.2.jar) and placed it in /opt/ibm/crawler/connectorFramework/crawler-connector-framework-0.1.18/lib/java/database/
Let me show you how I modified the configuration files:
On crawler.conf, under the main section “input_adapter” I changed the following values:
crawl_config_file = "connectors/database.conf",
crawl_seed_file = "seeds/database-seed.conf",
extra_jars_dir = "database",
On seeds/database-seed.conf, in the seed > attribute section, the portion of the url looks like this:
{
name ="url",
value="mongo://localhost:27017/local/tweets?per=1000"
},
(tried also using mongodb instead of mongo)
On connectors/database.conf, the first portion of the file looks like this:
crawl_extender {
attribute = [
{
name="protocol",
value="mongo"
},
{
name="collection",
value="SomeCollection"
}
],
(also tried using mongodb instead of mongo)
When I run the crawler command, this is my output:
pish@ubuntu-crawler:~$ crawler crawl --config ./crawler-config/config/crawler.conf
2017-08-02 04:29:10,206 INFO: Connector Framework service will start and connect to crawler on port 35775
2017-08-02 04:29:10,460 INFO: This crawl is running in CrawlRun mode
2017-08-02 04:29:10,460 INFO: Running a crawl...
2017-08-02 04:29:10,465 INFO: URLs matching these patterns will be not be processed: (?i)\.(xlsx?|pptx?|jpe?g|gif|png|mp3|tiff)$
2017-08-02 04:29:10,500 INFO: HikariPool-1 - Starting...
2017-08-02 04:29:10,685 INFO: HikariPool-1 - Start completed.
2017-08-02 04:29:12,161 ERROR: There was a problem processing URL mongo://localhost:27017/local/tweets?per=1000: Couldn't load JDBC driver :
2017-08-02 04:29:17,184 INFO: HikariPool-1 - Shutdown initiated...
2017-08-02 04:29:17,196 INFO: HikariPool-1 - Shutdown completed.
2017-08-02 04:29:17,198 INFO: The service for the Connector Framework Input Adapter was signaled to halt.
Attempting to shutdown the crawler cleanly.
What am I missing or doing wrong in my crawler?