1
votes

When we run the below commands as part of hadoop mapreduce streaming

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streamingxxxx.jar -input cities.txt -output streamout -mapper /bin/cat -reducer 'grep -i CA'

1)Does Java based mapreduce job is working in background?

2

2 Answers

2
votes

You can find internals of your command at this article

Both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a Map/Reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.

When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper.

When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer.

0
votes

You are correct, it is java code running behind the scene.. The MapReduce job is triggered by StreamJob and the Mapper is just a wrapper for the command specified if it is not a java class.