When configuring a HBase cluster alongside Hadoop HDFS, is it a good choice to deploy one region server per HDFS data node, or the ratio between region servers and data nodes should be different from 1:1 ?
one region server collocated with a single datanode per server
– Tucker
Does replication factor 3 matter in this case ?
– Nicola Ferraro
replication is handled by HDFS and not HBASe. Because hbase stores it's files on HDFS, the data will be replicated. This is normal hadoop behavior
– Tucker
I mean... in heavy load conditions, every data node will get writes concurrently from 3 region servers at the same time if replication factor is 3. Does it influence the choice ?
– Nicola Ferraro
yea, the data will be written to 3 data nodes if the replication factor is 3. Not sure what 'choice' you are asking about, but maybe take a look at 'hot spotting' with hbase with the most common write throughput issues. Also take a look at bulk loading hbase.apache.org/book/casestudies.perftroub.html hbase.apache.org/book/arch.bulk.load.html
– Tucker
1 Answers
You can use any ratio you want but the rule of thumb is 1:1. The less regions a RS has the better, more RS means less regions per server and less regions to reassign if the node fails which will improve the recovery time (by a lot, although there has been progress since 0.95: http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/)