HBase using Put from Hadoop, but not seeing value in HBase shell

Question

I have a simple map/reduce job that scans one hbase table, and modifies another hbase table. The hadoop job seems to complete successfully, but when I check the hbase table, the entry does not appear in there.

Here is the hadoop program:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class HBaseInsertTest extends Configured implements Tool {

    @Override
    public int run(String[] args) throws Exception {
        String table = "duplicates";

        Scan scan = new Scan();
        scan.setCaching(500);
        scan.setCacheBlocks(false);

        Job job = new Job(getConf(), "HBaseInsertTest");
        job.setJarByClass(HBaseInsertTest.class);

        TableMapReduceUtil.initTableMapperJob(table, scan, Mapper.class, /* mapper output key = */null,
            /* mapper output value= */null, job);
        TableMapReduceUtil.initTableReducerJob("tablecopy", /*output table=*/null, /*reducer class=*/job);

        job.setNumReduceTasks(0);

        // Note that these are the default.
        job.setOutputFormatClass(NullOutputFormat.class);

        return job.waitForCompletion(true) ? 0 : 1;
    }

    private static class Mapper extends TableMapper<ImmutableBytesWritable, Put> {
        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            super.setup(context);
        }

        @Override
        public void map(ImmutableBytesWritable row, Result columns, Context context) throws IOException {
            long id = 1260018L;

            try {
                Put put = new Put(Bytes.toBytes(id));
                put.add(Bytes.toBytes("mapping"), Bytes.toBytes("foo"), Bytes.toBytes("bar"));
                context.write(row, put);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration config = HBaseConfiguration.create();
        int res = ToolRunner.run(config, new HBaseInsertTest(), args);
        System.exit(res);
    }
}

From HBase shell:

hbase(main):008:0> get 'tablecopy', '1260018', 'mapping'
COLUMN                          CELL                                                                                    
0 row(s) in 0.0100 seconds

I've simplified the program a lot to try to demonstrate/isolate the problem. I'm also relatively new to both hadoop/hbase. I did verify that mapping is a column family that exists in the tablecopy table.

May be there is no output? Try printing out row and put before context.write — Hari Menon
There is output. Switching to string keys fixes the problem. — kfox

Vinayak Ponangi Vinayak Ponangi · Accepted Answer · 2012-03-23T12:43:40

I think the problem was you were querying for hbase(main):008:0> get 'tablecopy', '1260018', 'mapping'

instead you should have queried for this: hbase(main):008:0> get 'tablecopy', 1260018, 'mapping'

HBase was thinking it was a string key you were querying for, because of the quotations. Also if you just ran a simple client job at your end to retrieve this key from HBase, it would have gotten you the values correctly if it was already present.

HBase using Put from Hadoop, but not seeing value in HBase shell

2 Answers