0
votes

I’m trying to setup a local mongodb crawler for my Watson discovery service. MongoDB is up and running. I downloaded the JDBC connector (mongodb-driver-3.4.2.jar) and placed it in /opt/ibm/crawler/connectorFramework/crawler-connector-framework-0.1.18/lib/java/database/

Let me show you how I modified the configuration files:

On crawler.conf, under the main section “input_adapter” I changed the following values:

crawl_config_file = "connectors/database.conf",
crawl_seed_file = "seeds/database-seed.conf",
extra_jars_dir = "database",

On seeds/database-seed.conf, in the seed > attribute section, the portion of the url looks like this:

{
  name ="url",
  value="mongo://localhost:27017/local/tweets?per=1000"
},

(tried also using mongodb instead of mongo)

On connectors/database.conf, the first portion of the file looks like this:

crawl_extender {
  attribute = [
    {
      name="protocol",
      value="mongo"
    },
    {
      name="collection",
      value="SomeCollection"
    }
  ],

(also tried using mongodb instead of mongo)

When I run the crawler command, this is my output:

pish@ubuntu-crawler:~$ crawler crawl --config ./crawler-config/config/crawler.conf 
2017-08-02 04:29:10,206 INFO: Connector Framework service will start and connect to crawler on port 35775
2017-08-02 04:29:10,460 INFO: This crawl is running in CrawlRun mode
2017-08-02 04:29:10,460 INFO: Running a crawl...
2017-08-02 04:29:10,465 INFO: URLs matching these patterns will be not be processed: (?i)\.(xlsx?|pptx?|jpe?g|gif|png|mp3|tiff)$
2017-08-02 04:29:10,500 INFO: HikariPool-1 - Starting...
2017-08-02 04:29:10,685 INFO: HikariPool-1 - Start completed.
2017-08-02 04:29:12,161 ERROR: There was a problem processing URL mongo://localhost:27017/local/tweets?per=1000: Couldn't load JDBC driver : 
2017-08-02 04:29:17,184 INFO: HikariPool-1 - Shutdown initiated...
2017-08-02 04:29:17,196 INFO: HikariPool-1 - Shutdown completed.
2017-08-02 04:29:17,198 INFO: The service for the Connector Framework Input Adapter was signaled to halt.
Attempting to shutdown the crawler cleanly.

What am I missing or doing wrong in my crawler?

1
It looks like you need a JDBC 3.0 compliant driver for MongoDB to connect your data repository to crawl per IBM. You can try the commercial MongoDB JDBC driver from DataDirect which was certified by MongoDB. Note that I am employed by Progress. - Sumit Sarkar
For the sake of a test, I tried with MySQL and its connector compliant with JDBC 3.0 and the same error happens. I don't know why the crawler can't load a different JDBC driver other than the one it bundles and I can't find enough documentation on how to solve this issue. - Roland Pish

1 Answers

0
votes

At the end, turns out that I also had to specify the connection string in one of the configuration files. It works now.