I am a little confused on the number of mappers spawned by a MapReduce Job.
I have read in a lot of places that the number of mappers does not depend on the number of blocks but on the number of splits i.e. the number of maps is determined by the InputFormat. Mapper= {(total data size)/ (input split size)}
Example- data size is 1 TB and input split size is 128 MB.
Num Mappers = (1*1024*1024)/128 = 8192
The above seems right if I have my input format is FileInputFormat.
But what if my input format is TextInputFormat.
Suppose I have a file of size 1 GB , with default block size of 128MB (in Hadoop 2.x),the number of blocks will be 8.
The file is a text file with each line occupying 1MB.
Total number of lines : 1024
Total number of lines in each block : 128
Now when I set the inputFormat as TextInputFormat, how many mappers will be spawned by Hadoop.
Will it be 1024 ( one for each line) or 8 (one for each block) ?