Duplicated region servers shown in HBase master status

Question

There are 2 machines:

id-test-n03 : hadoop-hbase-master, hadoop-hbase-regionserver, hadoop-hbase-thrift, hadoop-zookeeper-server
id-test-i03 : hadoop-hbase-regionserver

Both of them are Ubuntu Maverick machines, installed all Hadoop(CDH3u3) and HBase packages using Cloudera CDH3 repository.

When using only id-test-n03 there was no problem. There was 1 region server on HBase master web console(http://id-test-n03:60010/master-status), as expected.

After adding a region server to id-test-i03, I found duplicated region servers(both have same address id-test-i03:60030) on HBase master web console: after_adding_rs

Status in hbase shell was:

hbase(main):001:0> status 'detailed'
version 0.90.4-cdh3u3
0 regionsInTransition
3 live servers
    id-test-i03:60020 1332390489086
        requests=0, regions=0, usedHeap=24, maxHeap=983
    id-test-n03.daum.net:60020 1332389557235
        requests=0, regions=2, usedHeap=26, maxHeap=983
        .META.,,1
            stores=1, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
        -ROOT-,,0
            stores=1, storefiles=2, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
    id-test-i03:60020 1332390489086
        requests=0, regions=0, usedHeap=0, maxHeap=0
0 dead servers

So, I tried to stop region server on id-test-i03, but found only one region server was dead and another one still alive: after_stopping_added_rs

hbase(main):002:0> status 'detailed'
version 0.90.4-cdh3u3
0 regionsInTransition
2 live servers
    id-test-n03.daum.net:60020 1332389557235
        requests=0, regions=2, usedHeap=29, maxHeap=983
        .META.,,1
            stores=1, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
        -ROOT-,,0
            stores=1, storefiles=2, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
    id-test-i03:60020 1332390489086
        requests=0, regions=0, usedHeap=0, maxHeap=0
1 dead servers
    id-test-i03.daum.net,60020,1332390489086

According to the information on the master web console, the difference between duplicated region servers are Start Code and Load. One has hostname, and the other has FQDN in start code. One with hostname has empty load.

/etc/hosts/ for both id-test-n03 and id-test-i03 is:

127.0.0.1       localhost

192.168.1.1   id-test-n03 id-test-n03.daum.net
192.168.1.2   id-test-i03 id-test-i03.daum.net

hbase-site.xml for both machines is:

<configuration>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://id-test-n03:8020/hbase</value>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>id-test-n03</value>
</property>
</configuration>

Thanks for any advice.

Problem solved by following steps.

First, deleted FQDNs in /etc/hosts for both machines:

127.0.0.1       localhost
192.168.1.1   id-test-n03
192.168.1.2   id-test-i03

Then, commented out search $DOMAIN_NAME in /etc/resolv.conf for both machines:

#search daum.net
nameserver 10.20.30.40

Finally, restart all Hadoop and HBase services on both machines.

By doing this, Hadoop and HBase servers use FQDN no more, only communicating with hostnames.

Adding and deleting region servers are reflected in HBase master status web console and hbase shell as expected.

You should add your solution as an answer and mark it as the correct answer. — Chris Shain
I tried that, but failed because a user whose reputation is under 100 is not allowed to answer her own question. — philipjkim

محمدباقر محمدباقر · Accepted Answer · 2012-06-21T08:39:08

Showing duplicate region servers for a short period of time after starting HBase is normal and it will automatically remove them itself after that period

Duplicated region servers shown in HBase master status

1 Answers