I am about to embark on a project using Apache Hadoop/Hive which will involve a collection of hive query scripts to produce data feeds for various down stream applications. These scripts seem like ideal candidates for some unit testing - they represent the fulfillment of an API contract between my data store and client applications, and as such, it's trivial to write what the expected results should be for a given set of starting data. My issue is how to run these tests.
If I was working with SQL queries, I could use something like SQLlite or Derby to quickly bring up test databases, load test data and run a collection of query tests against them. Unfortunately, I am unaware of any such tools for Hive. At the moment, my best thought is to have the test framework bring up a hadoop local instance and run Hive against that, but I've never done that before and I'm not sure it will work, or be the right path.
Also, I'm not interested in a pedantic discussion about if what I am doing is unit testing or integration testing - I just need to be able to prove my code works.