0
votes

I have a sample input file as below which contains a sequence number, name, medicine, gender, amount spent. My requirement is to get the total amount spent on each medicine. I have written a Mapreduce program and ran it in my local machine under a Single Node cluster with Hadoop and other necessary packages installed.

  1. Irma Ellison,avil,female,872
  2. Hilary Bush,avil,male,999
  3. Ahmed Mejia,paracetamol,female,654
  4. Grace Boone,metacin,female,918
  5. Hayes Ortiz,paracetamol,male,734
  6. Lani Matthews,paracetamol,female,836
  7. Cathleen Stewart,paracetamol,male,178
  8. Jonas Boone,metacin,female,649
  9. Desiree Pearson,avil,male,439
  10. Britanney Sullivan,metacin,female,659

for the above input i am expecting the output as below.

avil    2310  
metacin 2226  
paracetamol 2402

When I declare my reducer class as
public class VisReducer extends Reducer < Text, IntWritable, Text, IntWritable > . I am getting my expected output and everything looks good.

But mistakenly I have changed my reducer class declaration as
public class VisReducer extends Reducer< Text, Iterable< IntWritable >, Text, IntWritable > . The output seems to be just a Mapper output and looks like for some reason, reduce method in the Reduceer class has not run. I have added a System.out.println() in reduce method, and checked the logs and could not see what I printed, Whereas in the first case, I can see the output. nI am not able to understand what is causing the issue.

Can someone help me to understand what exactly is happening.

Output in my second case.

avil    439  
avil    999  
avil    872  
metacin 659  
metacin 649  
metacin 918  
paracetamol 178  
paracetamol 836  
paracetamol 734  
paracetamol 654  

It might be a very basic question as i am just starting my hadoop learning and could not find any relevant quetions online.

2
Post your driver code. - Ravindra babu

2 Answers

0
votes

You will get desired output when you declare Reducer as per specification

Visit Apache documentation page on Reducer, Reducer contain four parameters

org.apache.hadoop.mapreduce

Class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

KEYIN    - the input key  
VALUEIN  - the input value 
KEYOUT   - the output key type
VALUEOUT - the output value 

From your example:

public class VisReducer extends Reducer < Text, IntWritable, Text, IntWritable >


KEYIN     - Text  
VALUEIN   - IntWritable
KEYOUT    - Text
VALUEOUT  - IntWritable 

If you pass input key as Text, input value as IntWritable to Reducer, it will generate output key as Text and output value as IntWritable

0
votes

After all mappers completes it work they give the out put as key,value pairs. for ex: lets assume 2 mappers in ur case the mappers output is

mapper1 o/p

key1,value1 key2,value1

mapper2 o/p

key1,value2 key3,value1 key2,value2

And then Reducer class will be called. The Reducer class have 3 phases.

1.Shuffle: The Reducer copies the sorted output from each Mapper using HTTP across the network. here the shuffed temp o/p is

key1,value1 key2,value1 key1,value2 key3,value1 key2,value2

2.Sort: The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).

here the sorted temp o/p is

key1,value1 key1,value2 key2,value1 key2,value2 key3,value1

3.Reduce: In this phase the reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context) method is called for each in the sorted inputs. here the actual reduce method that works on the mapper o/p it takes the input as

key1,<value1,value2> key2,<value1,value2> key3,<value1>

The Reducer class declaration and the reduce method in side the Reducer class will be different. As the input parameters for a Reducer class will be the output parameters of Mapper class(maximum cases) and the reduce method parameters will be (Object, Iterable,org.apache.hadoop.mapreduce.Reducer.Context).