I'm trying to make 2 keys from my dataset, which has 2 columns of numbers separated by tab. I know how to make 1 key/value, but not sure how to make a second pair of key/value. In essence I want to make a key/value for each of the columns. Then in the reducer part, take the difference of the counts of each key.
Here's what I have for the mapper part:
public static class MyMapper extends Mapper<Object, Text, Text, IntWritable>{
private IntWritable one = new IntWritable(1);
private Text nodeX = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] data = value.toString().split("\\t");
String node0 = data[0];
String node1 = data[1];
StringTokenizer itr = new StringTokenizer(data);
while(itr.hasMoreTokens()){
nodeX.set(node0);
context.write(nodeX, one)
nodeY.set(node1);
context.write(nodeY, one)
}
}
Here's the reducer:
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum0 = 0;
for (IntWritable val : values) {
sum0 += val.get()
}
int sum1 = 0;
for (IntWritable val : values) {
sum1 += val.get()
}
diff = sum0 - sum1;
result.set(diff);
context.write(key, diff);
}
}
I think I did something in the part where the data is passed from mapper to reducer, might need 2 keys. New to Java and not sure how to get this correctly.
My input data looks like this:
123 543
123 234
543 135
135 123
And I would like the output to be, where I'm taking the difference of sum of the occurrences of col1 key and of col2 key.
123 1
543 0
135 0
234 -1