HMaster vs Zookeeper - HBase

Question

I have been doing a lot of reading about HBase lately and I am little confused as to the role of HMaster and Zookeeper in the architecture of HBase.

When a client requests for data, who gets that request? Assuming this is the first request. I understand subsequent requests can be directly made to region servers. But for that to happen, locations of meta files need to be retrieved and then a get or scan needs to run on the specific meta table in the region server.

The reason I ask is, if I am using Java I would use HConnectionManager class to create a connection. It looks like HConnectionManager already has a cache of region locations available. The reason the cache is built will be when some number of requests are made earlier, but what if the cache isn't there and this is the first request.

Who takes the first HBase request, will it be the zookeeper quorum? We are submitting the hbase-site.xml file for the HBaseConfiguration class.

Also I am a little confused about how do we define a "client"?

The other thing I read was the meta information gets cached on the "client", is this true even in case of HBase REST? Will the client here be the HMaster or the actual user who is making the REST call. If so doesn't it expose a security threat if metadata is exposed to client.

Arnon Rotem-Gal-Oz Arnon Rotem-Gal-Oz · Accepted Answer · 2015-07-04T05:16:40

Clients connect to ZooKeeper to get the latest state. The HBaseMaster role is to make sure this list is correct (i.e. assign regions to regionservers on startup, failures etc.). Clients will contact the HBaseMaster only for admin purposes e.g. creating a table, changing its structure etc. (via HBaseAdmin class). You can read more about it here.

In case of HBase REST the client sends REST request to the REST server which holds an HBase client internally

HMaster vs Zookeeper - HBase

2 Answers