0
votes

At the moment we could pass in folders/files as input for MapReduce job, but what I wanted to know if we could 'cat' data from HDFS (hdfs dfs -cat file.txt) and pass that as MapReduce job input?

1
I'm not sure if you can do it with Hadoop, but you can do it with Spark (spark can/does use hdfs). Check out: spark.apache.org/docs/1.2.0/streaming-programming-guide.html - TravisJ

1 Answers

0
votes

No, you could not do that with MapReduce.

Alternatives that can do that are: SparkStreaming and Storm.