Writable Classes in mapreduce

Question

How can i use the values from hashset (the docid and offset) to the reduce writable so as to connect map writable with reduce writable? The mapper (LineIndexMapper) works fine but in the reducer (LineIndexReducer) i get the error that it can't get string as argument when i type this: context.write(key, new IndexRecordWritable("some string"); although i have the public String toString() in the ReduceWritable too.
I believe the hashset in reducer's writable (IndexRecordWritable.java) maybe isn't taking the values correctly? I have the below code.

IndexMapRecordWritable.java
    
    

    
        import java.io.DataInput;
        import java.io.DataOutput;
        import java.io.IOException;
        import org.apache.hadoop.io.LongWritable;
        import org.apache.hadoop.io.Text;
        import org.apache.hadoop.io.Writable;
    
        public class IndexMapRecordWritable implements Writable {
    
            private LongWritable offset;
            private Text docid;
    
            public LongWritable getOffsetWritable() {
                return offset;
            }
    
            public Text getDocidWritable() {
                return docid;
            }
    
            public long getOffset() {
                return offset.get();
            }
    
            public String getDocid() {
                return docid.toString();
            }
    
            public IndexMapRecordWritable() {
                this.offset = new LongWritable();
                this.docid = new Text();
            }
          
            public IndexMapRecordWritable(long offset, String docid) {
                this.offset = new LongWritable(offset);
                this.docid = new Text(docid);
            }
            public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
                this.offset = indexMapRecordWritable.getOffsetWritable();
                this.docid = indexMapRecordWritable.getDocidWritable();
            }
            @Override
            public String toString() {
    
                StringBuilder output = new StringBuilder()
                output.append(docid);
                output.append(offset);
                
                return output.toString();
    
            }
    
            @Override
            public void write(DataOutput out) throws IOException {
 

            }
    
            @Override
            public void readFields(DataInput in) throws IOException {


            }
    
        }
    
    
    



    
    IndexRecordWritable.java
    
    

    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.util.HashSet;
    import org.apache.hadoop.io.Writable;
    
    public class IndexRecordWritable implements Writable {
    
        // Save each index record from maps
        private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
    
        public IndexRecordWritable() {
        }
    
        public IndexRecordWritable(
                Iterable<IndexMapRecordWritable> indexMapRecordWritables) {
  
        }
    
        @Override
        public String toString() {
    
            StringBuilder output = new StringBuilder();

            return output.toString();
    
        }
    
        @Override
        public void write(DataOutput out) throws IOException {

        }
   
        @Override
        public void readFields(DataInput in) throws IOException {

        }
    
    }

Where do you have context.write in your code? Please post the error message maybe as a screenshot — Prateek
by the looks of it, it seems like you've set the output class to be Text in your driver class using job.setOutputKeyClass(Text.class);. So, in your reducer class, the types are basically 'extend Reducer<input, input, output, output>' — Prateek
It looks like the problem is indeed in the IndexRecordWritable. Can you tell me what is the output of values from your reducer? Give me an example if you can.. — Prateek
Check this.. in your context.write, you are passing the value as an object of IndexRecordWritable with a string. But, your IndexRecordWritable constructor doesn't accept string but instead needs an iterable object. Can you tell me what happens in the constructor of IndexRecordWritable? This - public IndexRecordWritable( Iterable<IndexMapRecordWritable> indexMapRecordWritables) { /***/ } What happens in /***/? — Prateek

Prateek Prateek · Accepted Answer · 2020-11-10T02:34:37

Alright, here is my answer based on a few assumptions. The final output is a text file containing the key and the file names separated by a comma based on the information in the reducer class's comments on the pre-condition and post-condition.

In this case, you really don't need IndexRecordWritable class. You can simply write to your context using

context.write(key, new Text(valueBuilder.substring(0, valueBuilder.length() - 1)));

with the class declaration line as

public class LineIndexReducer extends Reducer<Text, IndexMapRecordWritable, Text, Text>

Don't forget to set the correct output class in the driver.

That must serve the purpose according to the post-condition in your reducer class. But, if you really want to write a Text-IndexRecordWritable pair to your context, there are two ways approach it -

with string as an argument (based on your attempt passing a string when you IndexRecordWritable class constructor is not designed to accept strings) and
with HashSet as an argument (based on the HashSet initialised in IndexRecordWritable class).

Since your constructor of IndexRecordWritable class is not designed to accept String as an input, you cannot pass a string. Hence the error you are getting that you can't use string as an argument. Ps: if you want your constructor to accept Strings, you must have another constructor in your IndexRecordWritable class as below:

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
    
    // to save the string
    private String value;

    public IndexRecordWritable() {
    }

    public IndexRecordWritable(
            HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
        /***/
    }

    // to accpet string
    public IndexRecordWritable (String value)   {
        this.value = value;
    }

but that won't be valid if you want to use the HashSet. So, approach #1 can't be used. You can't pass a string.

That leaves us with approach #2. Passing a HashSet as an argument since you want to make use of the HashSet. In this case, you must create a HashSet in your reducer before passing it as an argument to IndexRecordWritable in context.write.

To do this, your reducer must look like this.

@Override
    protected void reduce(Text key, Iterable<IndexMapRecordWritable> values, Context context) throws IOException, InterruptedException {
        //StringBuilder valueBuilder = new StringBuilder();

        HashSet<IndexMapRecordWritable> set = new HashSet<>();

        for (IndexMapRecordWritable val : values) {
            set.add(val);
            //valueBuilder.append(val);
            //valueBuilder.append(",");
        }

        //write the key and the adjusted value (removing the last comma)
        //context.write(key, new IndexRecordWritable(valueBuilder.substring(0, valueBuilder.length() - 1)));
        context.write(key, new IndexRecordWritable(set));
        //valueBuilder.setLength(0);
    }

and your IndexRecordWritable.java must have this.

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

// to save the string
//private String value;

public IndexRecordWritable() {
}

public IndexRecordWritable(
        HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
    /***/
    tokens.addAll(indexMapRecordWritables);
}

Remember, this is not the requirement according to the description of your reducer where it says.

POST-CONDITION: emit the output a single key-value where all the file names are separated by a comma ",".  <"marcello", "a.txt@3345,b.txt@344,c.txt@785">

If you still choose to emit (Text, IndexRecordWritable), remember to process the HashSet in IndexRecordWritable to get it in the desired format.

Writable Classes in mapreduce

1 Answers