3
votes

When configuring a HBase cluster alongside Hadoop HDFS, is it a good choice to deploy one region server per HDFS data node, or the ratio between region servers and data nodes should be different from 1:1 ?

1
one region server collocated with a single datanode per serverTucker
Does replication factor 3 matter in this case ?Nicola Ferraro
replication is handled by HDFS and not HBASe. Because hbase stores it's files on HDFS, the data will be replicated. This is normal hadoop behaviorTucker
I mean... in heavy load conditions, every data node will get writes concurrently from 3 region servers at the same time if replication factor is 3. Does it influence the choice ?Nicola Ferraro
yea, the data will be written to 3 data nodes if the replication factor is 3. Not sure what 'choice' you are asking about, but maybe take a look at 'hot spotting' with hbase with the most common write throughput issues. Also take a look at bulk loading hbase.apache.org/book/casestudies.perftroub.html hbase.apache.org/book/arch.bulk.load.htmlTucker

1 Answers

1
votes

You can use any ratio you want but the rule of thumb is 1:1. The less regions a RS has the better, more RS means less regions per server and less regions to reassign if the node fails which will improve the recovery time (by a lot, although there has been progress since 0.95: http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/)