1
votes

Is there any way to use a Hbase table as a source for a Hadoop streaming job ? Specifically, I want to run a Hadoop streaming job written in Python. This works well when the input is specified as a folder on HDFS. But I've not been able to find any documentation about reading data from a Hbase table.

Is this supported ? Or I'll have to go through the ordeal of writing a java code for getting data from Hbase to HDFS first and then run streaming job ?

I'm using Hbase 0.94 from Cloudera.

(There is a similar question already present here. But it points to a third party solution, not actively contributed to. I was hoping that this will be supported in Hbase).

1

1 Answers

0
votes

I would use Pig to load the data and then feed it into a streaming Python application.

See here: http://pig.apache.org/docs/r0.12.0/func.html#HBaseStorage http://pig.apache.org/docs/r0.12.0/basic.html#stream