There are 2 machines:
id-test-n03
: hadoop-hbase-master, hadoop-hbase-regionserver, hadoop-hbase-thrift, hadoop-zookeeper-serverid-test-i03
: hadoop-hbase-regionserver
Both of them are Ubuntu Maverick machines, installed all Hadoop(CDH3u3) and HBase packages using Cloudera CDH3 repository.
When using only id-test-n03
there was no problem. There was 1 region server on HBase master web console(http://id-test-n03:60010/master-status), as expected.
After adding a region server to id-test-i03
, I found duplicated region servers(both have same address id-test-i03:60030
) on HBase master web console:
Status in hbase shell
was:
hbase(main):001:0> status 'detailed'
version 0.90.4-cdh3u3
0 regionsInTransition
3 live servers
id-test-i03:60020 1332390489086
requests=0, regions=0, usedHeap=24, maxHeap=983
id-test-n03.daum.net:60020 1332389557235
requests=0, regions=2, usedHeap=26, maxHeap=983
.META.,,1
stores=1, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
-ROOT-,,0
stores=1, storefiles=2, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
id-test-i03:60020 1332390489086
requests=0, regions=0, usedHeap=0, maxHeap=0
0 dead servers
So, I tried to stop region server on id-test-i03
, but found only one region server was dead and another one still alive:
hbase(main):002:0> status 'detailed'
version 0.90.4-cdh3u3
0 regionsInTransition
2 live servers
id-test-n03.daum.net:60020 1332389557235
requests=0, regions=2, usedHeap=29, maxHeap=983
.META.,,1
stores=1, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
-ROOT-,,0
stores=1, storefiles=2, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0
id-test-i03:60020 1332390489086
requests=0, regions=0, usedHeap=0, maxHeap=0
1 dead servers
id-test-i03.daum.net,60020,1332390489086
According to the information on the master web console, the difference between duplicated region servers are Start Code
and Load
. One has hostname, and the other has FQDN in start code. One with hostname has empty load.
/etc/hosts/
for both id-test-n03
and id-test-i03
is:
127.0.0.1 localhost
192.168.1.1 id-test-n03 id-test-n03.daum.net
192.168.1.2 id-test-i03 id-test-i03.daum.net
hbase-site.xml
for both machines is:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://id-test-n03:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>id-test-n03</value>
</property>
</configuration>
Thanks for any advice.
Problem solved by following steps.
First, deleted FQDNs in /etc/hosts
for both machines:
127.0.0.1 localhost
192.168.1.1 id-test-n03
192.168.1.2 id-test-i03
Then, commented out search $DOMAIN_NAME
in /etc/resolv.conf
for both machines:
#search daum.net
nameserver 10.20.30.40
Finally, restart all Hadoop and HBase services on both machines.
By doing this, Hadoop and HBase servers use FQDN no more, only communicating with hostnames.
Adding and deleting region servers are reflected in HBase master status web console and hbase shell as expected.