This is about establishing connectivity between HIVE and MongoDB using the MongoDB driver. I have been through the links online and have noticed that many people just breeze through it.
However I have been facing major issues in just getting the connectivity established.
Firstly, I am using the Cloudera Quickstart VM 5.5 on my Windows 64 bit system. This VM hosts the Hadoop sandbox.
I have MongoDB installed on my desktop (the same one which also hosts the Cloudera VM). There is no authentication for the MongoDB database.
I had downloaded the 3 connectivity jars to be used to connect the 2 environments.
Here is a list of steps: -
Created Collection on MongoDB and populated data. MongoDB database server running on port 27017
Started Hive shell and Added the following jars into the classpath: -
mongo-hadoop-core-2.0.2.jar mongo-hadoop-hive-2.0.2.jar mongo-java-driver-3.5.0.jar
(The last one - the Mongo-java-driver was downloaded from a site that was referred to from mongodb itself)
I uploaded the above jars into HDFS in the directories mentioned below and then finally launched the HIVE shell and added the jars in the hive shell itself: -
> ADD JAR hdfs:///tmp/hive/mongo/mongo-hadoop-core-2.0.2.jar;
> ADD JAR hdfs:///tmp/hive/mongo/mongo-hadoop-hive-2.0.2.jar;
> ADD JAR hdfs:///tmp/hive/mongo/mongo-java-driver-3.5.0.jar;
Then I run this command from the HIVE shell which is supposed to work: -
CREATE TABLE RAVINE ( id INT, h_name STRING, h_age INT ) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" WITH SERDEPROPERTIES ('mongo.columns.mapping'='{"id":"_id","h_name":"name","h_age":"a ge"}') TBLPROPERTIES("mongo.uri"="mongodb://100.96.237.185:27017/test.beehive");
The error I get is the same as mentioned above by Arvind: -
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON
I went through the cloudera logs to see what the error was and stumpled upon this log: -
log file from cloudera
If you were to see the above error log. It says
"failed to read jar file", java.util.zip.zipcollection invalid END header (bad central directory offset)
This is quite confusing because I had obtained this same jar file -> the "mongo-java-driver" from https://mongodb.github.io/mongo-java-driver/
Why will a jar downloaded from the same site result in this type of error? I assumed the possibility of a corrupt jar file so I tried downloading various driver versions from 3.5.0. all the way down to 3.0.4!! No change - the same error.
Now here is my question. On the site to download the java driver, Mongodb displays a maven dependancy: -
So my question is quite simple !! How can a jar downloaded from the mongoDB website give me such errors - such as bad offset !!
Does anyone have any suggestions on what I should do next, most of the links on the web such as https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage seem to sound that this is a very straightforward easy 30 minute process, but it has taken me days to get to this state.
Any help or even suggestions are appreciated.

