Testing in the Hadoop platform

Question

When doing web development we can test our apps with tools and methodologies such as unit testing (jUnit, rspec, ...), TDD, BDD, cucumber, end-to-end/regression/integration tests, H2 (as in process database), ...

But in the Hadoop and Big Data world,

How do you test a hadoop/hive/pig code? By that I mean creating an automation for the situation that given I have a sample input, when I trigger some hive or pig script, then I verify the output is as expected.

With more details, Is there a way to get a quick feedback of these automated tests? More specifically, how to run in-memory HDFS? In Java with SQL databases, we use H2 to get this quick feedback.

Or more broadly, what are the testing strategies that people use in the Hadoop platform?

Julio Farah Julio Farah · Accepted Answer · 2014-06-10T21:37:12

I'm working as part of a team to support a big data and analytics platform, and we also have this kind of issue.

We've been searching for a while and we found two pretty promising tools: https://github.com/klarna/HiveRunner https://github.com/bobfreitas/HadoopMiniCluster

Hope it helps you =)

Testing in the Hadoop platform

1 Answers