0
votes

I'm trying to make 2 keys from my dataset, which has 2 columns of numbers separated by tab. I know how to make 1 key/value, but not sure how to make a second pair of key/value. In essence I want to make a key/value for each of the columns. Then in the reducer part, take the difference of the counts of each key.

Here's what I have for the mapper part:

public static class MyMapper extends Mapper<Object, Text, Text, IntWritable>{

        private IntWritable one = new IntWritable(1);
        private Text nodeX = new Text();

        public void map(Object key, Text value, Context context
                        ) throws IOException, InterruptedException {
            String[] data = value.toString().split("\\t");
            String node0 = data[0];
            String node1 = data[1];
            StringTokenizer itr = new StringTokenizer(data);
            while(itr.hasMoreTokens()){
                nodeX.set(node0);
                context.write(nodeX, one)
                nodeY.set(node1);
                context.write(nodeY, one)
        }
    }

Here's the reducer:

public static class IntSumReducer
        extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
                           ) throws IOException, InterruptedException {

            int sum0 = 0;
            for (IntWritable val : values) {
                sum0 += val.get()
            }
            int sum1 = 0;
            for (IntWritable val : values) {
                sum1 += val.get()
            }
            diff = sum0 - sum1;
            result.set(diff);
            context.write(key, diff);
        }
    }

I think I did something in the part where the data is passed from mapper to reducer, might need 2 keys. New to Java and not sure how to get this correctly.

My input data looks like this:

123 543
123 234
543 135
135 123

And I would like the output to be, where I'm taking the difference of sum of the occurrences of col1 key and of col2 key.

123 1
543 0
135 0
234 -1
1

1 Answers

0
votes

I think you wanted split the lines to words and let the word to be a number and then Calculated the difference . you can use NLineInputFormat that the key is the row number , split the value and calculate. otherwise . you can Definite a static long type to log the row number.

public static class TokenizerMapper extends
        Mapper<LongWritable, Text, LongWritable, IntWritable>
        {

    private IntWritable diffen = new IntWritable();
    private static long  row_num= 0;

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String[] data = value.toString().split("\\t");
        String node0 = data[0];
        String node1 = data[1];
        int dif = Integer.parseInt(node1)-Integer.parseInt(node0);
            diffen.set(dif);
            row_num++;
            context.write(new LongWritable(row_num), diffen);
    }
}

you can also write the value to reduce and split to two part and Calculate the different .ALL is ok;