Not able to insert data to Hbase table using MapReduce

Question

I have written a map reduce job to read data from a file and insert it into Hbase table. But the problem I am facing is that only 1 record gets inserted in Hbase table. I am not sure whether this is the last record or any random record since my input file is around 10Gb. The logic I have written, I am sure that the records should be inserted in thousands in the table. I am sharing only the reducer code and Driver class code as I am pretty sure, the problem lies here.Please find the code below:

public static class Reduce extends TableReducer<Text,Text,ImmutableBytesWritable> {


        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {

            Set<Text> uniques = new HashSet<Text>();
            String vis=key.toString();
            String[] arr=vis.split(":");

            Put put=null;
            for (Text val : values){
                if (uniques.add(val)) {
                put = new Put(arr[0].getBytes());
                put.add(Bytes.toBytes("cf"), Bytes.toBytes("column"),Bytes.toBytes(val.toString()));

                }
                context.write(new ImmutableBytesWritable(arr[0].getBytes()), put); 
            }

        }
    }

My Driver class:

        Configuration conf =  HBaseConfiguration.create();
        Job job = new Job(conf, "Blank");
        job.setJarByClass(Class_name.class);

        job.setMapperClass(Map.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setSortComparatorClass(CompositeKeyComprator.class);

        Scan scan = new Scan();
        scan.setCaching(500);       
        scan.setCacheBlocks(false); 


        job.setReducerClass(Reduce.class);
        TableMapReduceUtil.initTableReducerJob(
                "Table_name",
                Reduce.class,
                job);           

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

After running the program in the console, it says that Reduce output records=73579, but in the table only 1 record is inserted.

15/06/19 16:32:41 INFO mapred.JobClient: Job complete: job_201506181703_0020
15/06/19 16:32:41 INFO mapred.JobClient: Counters: 28
15/06/19 16:32:41 INFO mapred.JobClient:   Map-Reduce Framework
15/06/19 16:32:41 INFO mapred.JobClient:     Spilled Records=147158
15/06/19 16:32:41 INFO mapred.JobClient:     Map output materialized bytes=6941462
15/06/19 16:32:41 INFO mapred.JobClient:     Reduce input records=73579
15/06/19 16:32:41 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=7614308352
15/06/19 16:32:41 INFO mapred.JobClient:     Map input records=140543
15/06/19 16:32:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=417
15/06/19 16:32:41 INFO mapred.JobClient:     Map output bytes=6794286
15/06/19 16:32:41 INFO mapred.JobClient:     Reduce shuffle bytes=6941462
15/06/19 16:32:41 INFO mapred.JobClient:     Physical memory (bytes) snapshot=892702720
15/06/19 16:32:41 INFO mapred.JobClient:     Reduce input groups=1
15/06/19 16:32:41 INFO mapred.JobClient:     Combine output records=0
15/06/19 16:32:41 INFO mapred.JobClient:     Reduce output records=73579
15/06/19 16:32:41 INFO mapred.JobClient:     Map output records=73579
15/06/19 16:32:41 INFO mapred.JobClient:     Combine input records=0
15/06/19 16:32:41 INFO mapred.JobClient:     CPU time spent (ms)=10970
15/06/19 16:32:41 INFO mapred.JobClient:     Total committed heap usage (bytes)=829947904
15/06/19 16:32:41 INFO mapred.JobClient:   File Input Format Counters
15/06/19 16:32:41 INFO mapred.JobClient:     Bytes Read=204120920
15/06/19 16:32:41 INFO mapred.JobClient:   FileSystemCounters
15/06/19 16:32:41 INFO mapred.JobClient:     HDFS_BYTES_READ=204121337
15/06/19 16:32:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=14198205
15/06/19 16:32:41 INFO mapred.JobClient:     FILE_BYTES_READ=6941450
15/06/19 16:32:41 INFO mapred.JobClient:   Job Counters

And when I write the reducer output to a file, I get the correct output.But not in the Hbase table. Do let me know what I am missing here. Thanks in advance.

Maddy RS Maddy RS · Accepted Answer · 2015-06-18T13:05:00

You are inserting data into HBase using same Row Key under same Column family and Column Qualifier. And as per your counter statistics, you have only 1 reducer group. So, all your data is getting overwritten in the same cell. That's why you are getting only one row in HBase table.

Not able to insert data to Hbase table using MapReduce

1 Answers