1
votes

We have a table in HBase with pre-splits of count 16 (number of regions created are 16) and also we have loaded some data into the table and we can see the data is moved to different regions based on the splits we have defined.

Here we are having few doubts on the region concept.

  1. If one of the node goes down (if that node has one of the region of the HBase table), what will happen? And also is it possible to get/scan the data specific to that region of the HBase table

  2. will the full region replicated to other node or how it will work?

Can any one help me on this.

1

1 Answers

7
votes

Generally speaking HBase stores the data on Hadoop, which replicates the data in the cluster (the default is to have 3 copies but you can change that). When/if a RegionServer crashes the Master allocates the regions handled by that server to other regionServer(s).

The process is more involved,however, since HBase doesn't write data directly to files it buffers it first in memory. However it does write any new data into the WAL (write-ahead-log) so when a crash happens it also replays the WAL before the recovery is complete.

Also note that there are more details here e.g. around data locality, how HBase ensures the data is replicated etc. you can read about some of them here