hadoop: tracking MapReduce tasks

Question

I'm new to hadoop and this is probably a stupid question but I've been looking for it for hours and cannot find how to do it.

I'm running Hadoop MapReduce with a different number of mappers and reducers to see the difference in performance (e.g. execution time). I want to check if the specified number of mappers/reducers were used but I just can't figure out how I do it.

Hadoop 1.2.1 is installed on a quad-core machine with hyper-threading and I'm sshing to the server, and Hadoop is running in Pseudo-distributed mode.

My MapReduce program was written in Python, so I'm using hadoop-streaming, and this is how I ran the MR program.

$ hadoop jar /Users/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar 
-file /Users/hadoop/map.py 
-mapper /Users/hadoop/map.py 
-file /Users/hadoop/reduce.py 
-reducer /Users/hadoop/reduce.py 
-input file:///Users/hadoop/inputfile 
-output file:///Users/hadoop/outputfile

I want to see log information that looks like this, or anything that provides this kind of information.

@zsxwing I added how I ran the program in the question. Thank you. — kabichan

user3816822 user3816822 · Accepted Answer · 2017-02-25T20:53:11

You're looking for a service called the Resource Manager - this web interface includes links to logs like the one you've linked to in your question. This stackoverflow post has some answers about how to reach it. Given your version of hadoop, from the machine running hadoop you should be able to hit localhost:50030 to see the Resource Manager.

hadoop: tracking MapReduce tasks

1 Answers