0
votes

I have 2 files of the form

File 1:

key1 value1

key2 value2

...

File 2:

key1 value3

key2 value4

...

I would like to produce a reduce output of the form

key1 (value1-value3)/value1

key2 (value2-value4)/value2

I have the map write the key and the value is prepended with a character telling it is coming from file1 or file2, but not sure how to write the reduce stage

My map method is

public void map(LongWritable key,Text val,Context context) throws IOException,     InterruptedException
    {
        Text outputKey = new Text();
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("A")
        {               
            outputValue.set("A,"+val);
        }
        else
        {
            outputValue.set("B," + val);
        }
        context.write(outputKey,  outputValue);
    }
}
2

2 Answers

1
votes

It should be simple enough since you already tagged it, although a bit confusing to start. I assume that emitted values are like A23(for file1) & B139(for file2). Snippet :

public void reduce(Text key, Iterable<Text> values, Context context)
        throws IOException, InterruptedException {

    int diff = 0;
    int denominator = 1;
    for (Text val : values) {
        if (val.toString().startsWith("A")) {
            denominator = Integer.parseInt(val.toString().substring(1));
            diff += denominator;
        } else if (val.toString().startsWith("B")) {
            diff -= Integer.parseInt(val.toString().substring(1));
        } else {
            // This block shouldn't be reached unless malformed values are emitted
            // Throw an exception or log it
        }
    }
    diff /= denominator;
    context.write(key, new IntWritable(diff));
}

Hope this will help. But I think your approach will fail badly when key1 and key2 are equal.

UPDATE
The map should be like the following to work with the above reducer :

public void map(LongWritable key, Text val, Context context)
            throws IOException, InterruptedException {
        String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
        String[] keyVal = val.toString().split("\\s+");
        Text outputKey = new Text(keyVal[0]);
        Text outputValue = new Text();
        outputKey.set(key.toString());
        if ("fileA".equals(fileName)) {
            outputValue.set("A" + keyVal[1]);
        } else {
            outputValue.set("B" + keyVal[1]);
        }
        context.write(outputKey, outputValue);
    }
0
votes

I have found using NamedVector very helpful in such circumstances. This provides an identification for the value so that you can perform required operations on the values based on the "name".