0
votes

Hi I am trying to Export Hbase table snapshot to my local hdfs so that i can run mapreduce on that .

I have taken the snapshot of the Hbase table using below command

snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot'

Also when i ran list_snapshots command i can see my snapshot also .

I have exported my Hbase table snapshot to my local HDFS directory using below command and copied successfully .

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16

Finally i have to run map reduce on the snapshot so below is my driver code to configure that job .

TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // input table
                scan, // Scan instance to control CF and attribute selection
                DefaultMapper.class, // mapper class
                NullWritable.class, // mapper output key
                Text.class, // mapper output value
                job,
                true,
                new Path("/home/cloudera/archive/data/default/FundamentalAnalytic/bc95715f67e52547e86b5b096a1f1cb5/cf/d29205a44623434eba2d100a94d8ebfb_SeqId_4_"));

This is where i get error . I dont know which path i have to give as last arg in initTableSnapshotMapperJob method.

When i run this code i get below error .

org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo
    at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:294)
    at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:818)
    at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:355)
    at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:204)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:335)
    at com.thomsonretuers.hbase.HBaseToFileDriver.run(HBaseToFileDriver.java:128)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at com.thomsonretuers.hbase.HBaseToFileDriver.main(HBaseToFileDriver.java:75)
Caused by: java.io.FileNotFoundException: File file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo does not exist

One quick Question about Snapshot.

  1. I want to take snapshot and run full table scan ,in that case scan on snapshot will impact the region server performance?
1

1 Answers

0
votes

I solved it after using correct path

Create snapshot

snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot'

Export Snapshot to local hdfs

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16

Driver Job Configuration to rum mapreduce on Hbase snapshot

    String snapshotName="FundamentalAnalyticSnapshot";
    Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp");
    String  hbaseRootDir =  "hdfs://quickstart.cloudera:8020/hbase";

TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // Snapshot name
                    scan, // Scan instance to control CF and attribute selection
                    DefaultMapper.class, // mapper class
                    NullWritable.class, // mapper output key
                    Text.class, // mapper output value
                    job,
                    true,
                    restoreDir);

Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.