Is it possible to Stream data as input into MapReduce job

Question

At the moment we could pass in folders/files as input for MapReduce job, but what I wanted to know if we could 'cat' data from HDFS (hdfs dfs -cat file.txt) and pass that as MapReduce job input?

I'm not sure if you can do it with Hadoop, but you can do it with Spark (spark can/does use hdfs). Check out: spark.apache.org/docs/1.2.0/streaming-programming-guide.html — TravisJ

Ashrith Ashrith · Accepted Answer · 2015-01-28T15:38:42

No, you could not do that with MapReduce.

Alternatives that can do that are: SparkStreaming and Storm.

Is it possible to Stream data as input into MapReduce job

1 Answers