I ran Hadoop MapReduce on 1.1GB file multiple times with a different number of mappers and reducers (e.g. 1 mapper and 1 reducer, 1 mapper and 2 reducers, 1 mapper and 4 reducers, ...)
Hadoop is installed on quad-core machine with hyper-threading.
The following is the top 5 result sorted by shortest execution time:
+----------+----------+----------+
| time | # of map | # of red |
+----------+----------+----------+
| 7m 50s | 8 | 2 |
| 8m 13s | 8 | 4 |
| 8m 16s | 8 | 8 |
| 8m 28s | 4 | 8 |
| 8m 37s | 4 | 4 |
+----------+----------+----------+
Edit
The result for 1 - 8 reducers and 1 - 8 mappers: column = # of mappers row = # of reducers
+---------+---------+---------+---------+---------+
| | 1 | 2 | 4 | 8 |
+---------+---------+---------+---------+---------+
| 1 | 16:23 | 13:17 | 11:27 | 10:19 |
+---------+---------+---------+---------+---------+
| 2 | 13:56 | 10:24 | 08:41 | 07:52 |
+---------+---------+---------+---------+---------+
| 4 | 14:12 | 10:21 | 08:37 | 08:13 |
+---------+---------+---------+---------+---------+
| 8 | 14:09 | 09:46 | 08:28 | 08:16 |
+---------+---------+---------+---------+---------+
(1) It looks that the program runs slightly faster when I have 8 mappers, but why does it slow down as I increase the number of reducers? (e.g. 8mappers/2reducers is faster than 8mappers/8reducers)
(2) When I use only 4 mappers, it's a bit slower simply because I'm not utilizing the other 4 cores, right?