I am new in Azure and HBase . Say that I have 2 HDInsight (HBase ) cluster one installed in Asia and one on Europe, to get a better read/write performance for users access from different country. but How to run a query over all data of these clusters ? Do I need to run query separately on all the clusters then combine the results ? Or there is some build-in functions like Distributed Queries for SQLserver
1 Answers
There is no distributed query across clusters in HBase. In your scenario the best solution would probably be setting up replication between two hbase clusters and then querying one of them. The data in both clusters will be complete with the data from the other cluster a few minutes stale as replication is asynchronous. You can also setup more complex replication typologies and have a separate central cluster that has superset of data while two others have their local subsets.
HDInsight team is working on documentation for replication setup in Azure. For now you would need to discover configuration yourself. You would need to provision clusters in the VNets, connect VNets, ensure they have name resolution setup correctly and then use hbase replication setup steps to setup replication itself: http://hbase.apache.org/book.html#_cluster_replication
Without replication solution you would need to query both clusters separately.