I'm writing Hadoop streaming jobs in R and I've encountered a rather odd situation for which I can't find any documentation. I'd like to run a reduce job (no mapper required) that passes directly to another mapper. Is it possible to stack a map job directly after a reduce job without an initial mapper? If I write an identity mapper to pass output to my reduce job can I then pass the reduce output to another mapper, and if so, how? My current code is:
$HADOOP_HOME/bin/hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar \
-reduce myreducer.r \
-input myinput/ \
-output myoutputdir \
-file file1.r \
-file file2.Rdata
And this is not working.