0
votes

I have a table in Hbase that has the following data in it:

ROW COLUMN+CELL
1   column=brid:, timestamp=1470047093100, value=a1234
1   column=custid:, timestamp=1470046713207, value=811411
2   column=brid:, timestamp=1470047231583, value=a6789
2   column=custid:, timestamp=1470047156905, value=848727431

I am trying to read this data into Spark and then print the data inside the table to the console. My code for accomplishing this is as follows:

val conf = new SparkConf().setAppName("Spark Base").setMaster("local[*]")
val sc = new SparkContext(conf)

val hbaseConf = HBaseConfiguration.create()
hbaseConf.set("hbase.zookeeper.quorum", "127.0.0.1")
hbaseConf.set("hbase.zookeeper.property.clientPort", "5181") 
hbaseConf.set(TableInputFormat.INPUT_TABLE, "/path/to/custid1") 

val hbaseData = sc.newAPIHadoopRDD(hbaseConf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])

hbaseData.map(row => Bytes.toString(row._2.getValue("custid".getBytes(), "brid".getBytes()))).collect().foreach(println)
println("Number of Records found : " + hbaseData.count())
sc.stop()

The output looks like this:

null
null
Number of Records found : 2

The count is correct as there are only two records in the Hbase table. But why is it displaying the values as null? And, how can i get it to actually print the values inside the table?

Thanks.

1

1 Answers

0
votes

row._2.getValue("custid".getBytes(), "brid".getBytes()) takes parameters column family, qualifier (column name), in your case you have 2 column families and empty string as qualifiers. since custid:bird is not valid column name null is returned.

to print something try: row._2.getValue("bird".getBytes(), "".getBytes())