0
votes

I have a basic understanding of how Hadoop order the data from Mapper to Reducer. I have the following data written to context Mapper. The below data is a key, value pair

abc 1234
cde 2394
dec 8273
abc 2348
cde 8780
dec 6590

Key's abc, cde, dec continuous for n-times with same or different values. Reducer reads in key with group of values. I.e

abc {1234, 2348, ...} and so on with other keys.

Question: Is there a possibility of reading data into reducer in a same order of Mapper output, instead of unique keys group with values ?

1
Are you using combiner in the middle? You won't get abc(1234,2348) without combiner in place.Ravindra babu
What is your requirement?... If need the same order you can skip using reducer and just have the mapper in placemadhu
@madhu, you are right. But, my file contains some header in the beginning, which would be processed by a mapper. Data thereafter is related to the header. I need to process data based on header.srikanth
After reading the header, why can't you use Partitioner, Combiner & Sorter? Sorting the values before Reducer receiving the inputs is more efficient than sorting at ReducerRavindra babu

1 Answers

0
votes

If you are required to process the data based on header then i think you can use the below approach:-

Mapper :-

Cut the header and make that as your key and the remaining data as your value. Now all of the data for that particular header will move to the reducer.

Reducer :-

We will be having these values in reducer without grouping.

abc 1234
cde 2394
dec 8273
abc 2348
cde 8780
dec 6590

Then we will be able to process the data individually.