0
votes

When I added a new region server to the HBase cluster,no regions were assigned to this new region server.

The new region server is now present on the web UI but its Num. Regions and Requests Per Second are both ZERO according the web UI.

This is the region server log and this is the master log.

It seems region server was added successfully but the re-balance mechanism didn't work.

How can I make it to re-balance regions over all regions servers?

This is the first time I ask question here, hope someone can help, thanks a lot.

2

2 Answers

0
votes

Go to HBase shell and run command balancer. This will run balancer once. It returns true (success) or false (has problem). If you have problems check for regions stuck in transition.

Balancer can be run periodically, use balance_switch in HBase shell.

0
votes

I had found the reason of this phenomenon.There was some thing wrong in splitting process of some regions,which were always in transition and had never complete their splitting process,and this caused the balancer cannot run normally.

look at the balancer code snippy of at HMster.java:

public boolean balance() throws IOException {
  //...
  if (this.assignmentManager.getRegionStates().isRegionsInTransition()) {
    Map<String, RegionState> regionsInTransition =
      this.assignmentManager.getRegionStates().getRegionsInTransition();
    LOG.debug("Not running balancer because " + regionsInTransition.size() +
      " region(s) in transition: " + org.apache.commons.lang.StringUtils.
        abbreviate(regionsInTransition.toString(), 256));
    return false;
  }
  //...
}

The "if" statement were always true so this method always returned false,and would not run the code below which actually balance the region server cluster.

I don't know what caused the failure of spliting of some regions,but when I tried to move one region from one region server to another,I found the error message in region server:

2018-05-17 13:11:12,695 ERROR         [B.defaultRpcServer.handler=99,queue=9,port=26020] regionserver.RSRpcServices: Failed warming up region tsdb,\x00\x12\x19Z\xD2P,1525840795373.c3ebb018b9c3fc101a7b9def9100fb5f.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase-holmes/data/default/tsdb/32ef153360b7a9499e555a7937418ee7/t/a6cdb25689234e539ed82230ed7b790f
    at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
    at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

    at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:943)
    at org.apache.hadoop.hbase.regionserver.HRegion.initializeWarmup(HRegion.java:967)
    at org.apache.hadoop.hbase.regionserver.HRegion.warmupHRegion(HRegion.java:6554)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.warmupRegion(RSRpcServices.java:1709)
    at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22241)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2188)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    at java.lang.Thread.run(Thread.java:745)
...

The region I wanted to move was c3ebb018b9c3fc101a7b9def9100fb5f but the error said what cannot found is files in region 32ef153360b7a9499e555a7937418ee7,later I found that the region c3ebb018b9c3fc101a7b9def9100fb5f is the daughter of region 32ef153360b7a9499e555a7937418ee7.

Then I checked hdfs,I found the parent region was missing ,and reference file in it's daughter region which point to parents store file was present.That is to say, the reference files in daughter regions pointed some non-existing files.

So,region server found the reference file in daught regions but cannot find the parents regions and then throwed this Exception.

finally,I removed the reference file of thoes splitting regions,and the balancer begun work normally.But I don't know if there is some data lost.