Integration testing Hive jobs

Question

I'm trying to write a non-trivial Hive job using the Hive Thrift and JDBC interfaces, and I'm having trouble setting up a decent JUnit test. By non-trivial, I mean that the job results in at least one MapReduce stage, as opposed to only dealing with the metastore.

The test should fire up a Hive server, load some data into a table, run some non-trivial query on that table, and check the results.

I've wired up a Spring context according to the Spring reference. However, the job fails on the MapReduce phase, complaining that no Hadoop binary exists:

java.io.IOException: Cannot run program "/usr/bin/hadoop" (in directory "/Users/yoni/opower/workspace/intellij_project_root"): error=2, No such file or directory

The problem is that the Hive Server is running in-memory, but relies upon local installation of Hive in order to run. For my project to be self-contained, I need the Hive services to be embedded, including the HDFS and MapReduce clusters. I've tried starting up a Hive server using the same Spring method and pointing it at MiniDFSCluster and MiniMRCluster, similar to the pattern used in the Hive QTestUtil source and in HBaseTestUtility. However, I've not been able to get that to work.

After three days of trying to wrangle Hive integration testing, I thought I'd ask the community:

How do you recommend I integration test Hive jobs?
Do you have a working JUnit example for integration testing Hive jobs using in-memory HDFS, MR, and Hive instances?

Additional resources I've looked at:

Edit: I am fully aware that working against a Hadoop cluster - whether local or remote - makes it possible to run integration tests against a full-stack Hive instance. The problem, as stated, is that this is not a viable solution for effectively testing Hive workflows.

Since it's looking for an installation, why not create a RAM disk that you can point it to? Other than that, you'll have to start examining the source to see how it uses the configuration you provide it. Then you can write your own glue to bypass the config, and run the features directly. — WeaponsGrade
@oby1 should have a patch that adds support, but I don't have access to it. — yoni
I'll open source our JUnit test rule for this as soon as I can. — oby1
@yoni Can you post the complete solution that you ended up with here please? I'm in the exact same situation as you were, and while I have Hive JDBC client working, and the MiniDFSCluster code from below working, when I try to run both together (using "jdbc:hive2:///" URL) for a "CREATE TABLE..." query, I get this: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask — Nishant Kelkar

oby1 oby1 · Accepted Answer · 2014-02-16T21:48:23

Ideally one would be able to test hive queries with LocalJobRunner rather than resorting to mini-cluster testing. However, due to HIVE-3816 running hive with mapred.job.tracker=local results in a call to the hive CLI executable installed on the system (as described in your question).

Until HIVE-3816 is resolved, mini-cluster testing is the only option. Below is a minimal mini-cluster setup for hive tests that I have tested against CDH 4.4.

Configuration conf = new Configuration();

/* Build MiniDFSCluster */
MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();

/* Build MiniMR Cluster */
System.setProperty("hadoop.log.dir", "/path/to/hadoop/log/dir"); // MAPREDUCE-2785
int numTaskTrackers = 1;
int numTaskTrackerDirectories = 1;
String[] racks = null;
String[] hosts = null;
miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),
                           numTaskTrackerDirectories, racks, hosts, new JobConf(conf));

/* Set JobTracker URI */
System.setProperty("mapred.job.tracker", miniMR.createJobConf(new JobConf(conf)).get("mapred.job.tracker"));

There is no need to run a separate hiveserver or hiveserver2 process for testing. You can test with an embedded hiveserver2 process by setting your jdbc connection URL to jdbc:hive2:///

Integration testing Hive jobs

6 Answers