I wrote a MR job in python running by streaming jar package. I want to know how to use bulk load to put data into HBase.
I konw that there are 2 ways to get the data into hbase by bulk loading.
- generate the HFiles in MR job, and use CompleteBulkLoad to load data into hbase.
- use ImportTsv option and then use CompleteBulkLoad to load data.
I don't know how to use python generate HFile to fits in Hbase. And then I try to use ImportTsv utility. But got failure. I followed the instructions in this [example](http://hbase.apache.org/book.html#importtsv).But I got exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/Filter...
Now I want to ask 3 questions:
- Whether Python could be used to generate HFile by streaming jar or not.
- How to use importtsv.
- Could bulkload be used to update the table in Hbase. I get a big file bigger than 10GB every day. Could bulkload be used to push the file into Hbase.
The hadoop version is: Hadoop 2.8.0
The hbase version is: HBase 1.2.6
Both running in standalone mode.
Thanks for any answer.
--- update ---
ImportTsv works correctly.
But I stil want to know how to generate the HFile in MR job by streaming jar in Python language.