Why is my Hadoop MapReduce doesn't run faster even when i add nodes on the cluster?

Question

So i run a 50 MB of data with WordCount on my Hadoop cluster. i run the test on 5 different cluster size, single-node cluster up to 5 node cluster. The thing is, the execution time isn't changing much. it only have 1 - 2 minutes different on each run. isn't adding node to a cluster result in more resource that can be used and making the job run faster?

i expect the execution time to be much more faster with each node addition, but the result showing me otherwise.

the node i use have 2 GB of RAM and 2 cores. i don't change anything regarding container on yarn-site.xml and map/reduce allocation.mb on mapred-site.xml.

Javier Javier · Accepted Answer · 2019-06-17T07:01:29

You need to test it with a bigger amount of data. YARN will allocate a map container for each HDFS block of data. The default HDFS block size is usually 64Mb, so perhaps your test file only uses one HDFS block. A container is the minimum slice of computation that YARN will assign to a node. In the worst case for your testing, it will need only one container for the map phase, and another for the reduce phase. 2 containers usually fits in just one node, so adding more nodes doesn't give you more speed.

Why is my Hadoop MapReduce doesn't run faster even when i add nodes on the cluster?

1 Answers