3
votes

When doing web development we can test our apps with tools and methodologies such as unit testing (jUnit, rspec, ...), TDD, BDD, cucumber, end-to-end/regression/integration tests, H2 (as in process database), ...

But in the Hadoop and Big Data world,

How do you test a hadoop/hive/pig code? By that I mean creating an automation for the situation that given I have a sample input, when I trigger some hive or pig script, then I verify the output is as expected.

With more details, Is there a way to get a quick feedback of these automated tests? More specifically, how to run in-memory HDFS? In Java with SQL databases, we use H2 to get this quick feedback.

Or more broadly, what are the testing strategies that people use in the Hadoop platform?

1

1 Answers

3
votes

I'm working as part of a team to support a big data and analytics platform, and we also have this kind of issue.

We've been searching for a while and we found two pretty promising tools: https://github.com/klarna/HiveRunner https://github.com/bobfreitas/HadoopMiniCluster

Hope it helps you =)