what's the impact of too many regions in hbase?

Question

Now I am managing a small hbase cluster consisting of about ten region servers,on which each region server holds more than 1000 regions.It looks not good so the log file keeps warning ‘Total number of regions is approaching the upper limit 1000. Please consider taking a look at http://hbase.apache.org/book.html#ops.regionmgt’. But the cluster has been working well for a long time, without any exception.

I refered to the official doc add found the description below:

If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny flushes when you have too many regions which in turn generates compactions. Rewriting the same data tens of times is the last thing you want. An example is filling 1000 regions (with one family) equally and let’s consider a lower bound for global MemStore usage of 5GB (the region server would have a big heap). Once it reaches 5GB it will force flush the biggest region, at that point they should almost all have about 5MB of data so it would flush that amount. 5MB inserted later, it would flush another region that will now have a bit over 5MB of data, and so on. This is currently the main limiting factor for the number of regions;

But I can't understand why this can be the main limiting factor. What impact will be caused if flush those small memstore one by one?

Vivek Jain Vivek Jain · Accepted Answer · 2019-04-09T19:25:49

From the Book Architecting HBase Application by Kevin O'Dell : Chapter 14 : "These compactions will cause excessive churn on the cluster, affecting performance.When specific operations are triggered (automatic flush, forced flush, and user call for compactions), if required, HBase will start compactions.When many compactions run in tandem, it is known as a compaction storm."

I hope its clear now.

what's the impact of too many regions in hbase?

1 Answers