What can i use instead of MapReduce in Hadoop , And good is Hadoop for small cluster?

Question

In MapReduce we need to write bash scripts and run jobs for getting data. I want to get data easily like we query in sql in order to get data. We can use Hive, Pig, HBase, Sqoop, Flume, Oozie, ZooKeeper, and Hue for such purpose.

But which is best to use here?
And do all these frameworks use MapReduce in background?

Yeah, and now what? Yahoo is using it as well as thousands of other companies. — Thomas Jungblut

alexlod alexlod · Accepted Answer · 2011-12-05T13:41:17

As for as data analysis goes, MapReduce is your only native option for querying data in HDFS or any of Hadoop's other supported file systems. That said, solutions such as Hive and Pig create an abstraction on top of Hadoop, allowing you to write PigLatin or Hive-SQL instead of Java. Pig and Hive both compile down to MapReduce.

Another alternative is using Hadoop Streaming, which lets you write MapReduce in any language, including Python, Ruby, bash, etc.

As for which option is better, that's your decision to make. MapReduce in Java will always be the fastest, because it's native and you have controls to fine-tune your jobs. But Hive and Pig are significantly faster to develop and easier to maintain. Streaming is great for people who don't like or know Java but still want more control than Hive and Pig, though these days Hive and Pig are pretty mature and very flexible.

What can i use instead of MapReduce in Hadoop , And good is Hadoop for small cluster?

1 Answers