I'm running a MR program with a different number of mapper and reducer to test how the execution time changes. I came to the point where I can set the split size to change the number of mappers, and I'm seeing some changes in execution times. I'm using a remote machine (quad-core with hyper-threading). Hadoop version : 1.2.1 input file size: 1GB
So, what I want to do now is to verify that the MR was really running as I configured.
For example, I set the split size to about 250MB so that I have four mappers. In the output file (_logs/history/job....), I see that it says
TOTAL MAP TASKS = 4
LAUNCHED MAP TASKS = 4
FINISHED MAP TASKS = 4
DATA-LOCAL MAP TASKS = 1
(1) In this case, can I say that four cores (four mappers) were used?
(2) When I run TOP, I only see two Java processes and two python processes (the MR program are written in python).Even if I expect to have 4 mappers or 8 mappers, I always see two Java processes only. Does it mean that I'm not utilizing other cores?
