My question is that, does the MapReduce framework (for example Hadoop implementation), assign the inputs for mappers before mapper job starts or it is done at runtime?
That is, assume I have some input i
, and machines m_1,m_2 .. m_k
. The machines need not to be equally powered, some may have better performance (CPU,Memory) than other. If the master node splits the inputs to mappers until the mapper task begins, or at least assigns the inputs to particular mapper node, there may be the case that some machines (stronger ones) could finish their job and wait. However, if the split job is done on the runtime, this problem does not arise.
If you also indicate the overall split mechanism of MapReduce
in preMapper phase, I would be happy.