1
votes

My customer has a computing scenario that some data were stored in Hive in cluster A and some other data were stored in Hbase in cluster B, then they want to do some join operation with the 2 kinds of tables.

so is there a way to let me do it in Hive like this:

select hive_table.col1, hbase_table.col2 from hive_table inner join hbase_table on hive_table.id = hbase_table.id    

hive table and hbase table exist in the different cluster.

1
Are the able to query the Hbase table at all? If not, I would suggest trying to get that part working - OneCricketeer
yes, they can. they just want to do the join operation from hive to hbase - user2575502
What happens when they try? - OneCricketeer

1 Answers

3
votes

Yes, it is possible to join Hive table(table1-assuming it is in HDFS) with Hbase table(table2-hbase table). But it is not recommended because HBase is not efficient with full table scans, when you try to join. Best way to do it is convert the Hbase table to parquet or AVRO. Now table1 form Hive and table2 from Hbase both resides in HDFS which makes it efficient.

In nutshell, we can join any tables that Hive metastore have stored. It doesn't matter whether the Hive tables are built over HDFS, Hbase. As long as we have schema in hive metastore we can join them.

Assuming Hive metastore contains the schemas for both tables.

select hive_table.col1, hbase_table.col2 from hive_table inner join hbase_table on (hive_table.id = hbase_table.id);