I'm trying to implement a MapReduce algorithm for a specific problem. Let's say that in my Mapper I need to handle a large-sized Text Object. My question is summarised in the following example. I have the Text Object: Today is a lovely day and I need to do some processing on the words. So I have two options:
I can send to the Reducer key-value pairs of the form:
<1,Today> <1,is> <1,a> <1,lovely> <1,day>I can send the key-value pair
<1,Today is a lovely day>to the reducer and then process it, e.g. tokenise the String object.
What is the best approach for this case? In the first case I have to send more data to the reducer but I have no String Object to tokenise as in the second case. However in the second case, I have a smaller amount of data sent by the Mapper.