0
votes

Can somebody please help me in understanding below questions related to Hadoop 1.x?

  1. Say I have just a single node where I have 8 GB of RAM and 40 TB of hard disk with quad core processor. Block size is 64 MB. We need to process 4 TB of data. How do we decide the number of Mappers and Reducers?

    Can someone please explain in detail? Please let me know if I need to consider any other parameter for calculation.

  2. Say I have 10 Data nodes in a cluster and each node is having 8 GB of RAM and 40 TB of Hard disk with quad core processor. Block size is 64MB. We need to process 40 TB data. How do we decide the number of Mappers and Reducers?

  3. What is the default number for mapper and reducer slots in a Data node with quad core processor?

Many Thanks, Manish

1

1 Answers

0
votes

Number of mappers = Number of splits. Input file would be divided into splits. Each split will have set of records. On an average, each split is of one block size(64 MB above). So in your case you would have around 62500 mappers(or splits) (4TB/64). You also have option to give configurable input split size. Generally this is done when reading the entire file once, and you decide how records should be processed.

Number of reducers = Number of unique keys in the mapper output. You can choose the number of reducers by configuring them in job class or at jab running command. The above number is based on default hash partitioner. You can create your own partitioner, which can decide number of reducers.