we are testing hadoop HA cluster on Azure v2 cloud running on linux. We are trying to switch to Azure BLOB storage. We are not sure how we should configure name nodes using Blob storage. We are getting following error:
2015-12-22 13:05:50,193 INFO ha.StandbyCheckpointer (StandbyCheckpointer.java:start(129)) - Starting standby checkpoint thread...
Checkpointing active NN at http://bd-azure-qa-nn2:50070
Serving checkpoints at http://bd-azure-qa-nn1:50070
2015-12-22 13:07:50,240 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(269)) - Triggering log roll on remote NameNode bd-azure-qa-nn2/10.0.0.7:8020
2015-12-22 13:07:51,387 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:52,391 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:53,400 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:54,416 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:55,425 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:56,450 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:57,456 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:58,462 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:59,473 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:00,478 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:01,482 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:02,490 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:03,501 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:04,515 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:05,520 INFO ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:05,966 WARN ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(274)) - Unable to trigger a roll of the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1352)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6339)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:933)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy14.rollEditLog(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:145)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
We are not pretty sure how to configure name nodes with Azure Blob since Blob effectively handover the HDFS functionality. We have standard HA name node configuration in hdfs-site.xml and modified core-site.xml with the following properties to enable BLOB storage.
<property>
<name>fs.AbstractFileSystem.wasb.impl</name>
<value>org.apache.hadoop.fs.azure.Wasb</value>
</property>
<property>
<name>fs.azure.account.key.OUR_STORAGE_ACCOUNT.blob.core.windows.net</name>
<value>"OUR_KEY"</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>wasb://blob-hdfs@OUR_STORAGE_ACCOUNT.blob.core.windows.net</value>
<final>true</final>
</property>
<property>
<name>fs.azure.page.blob.dir</name>
<value>/datadir</value>
</property>
<property>
<name>fs.azure.selfthrottling.read.factor</name>
<value>1.000000</value>
</property>
<property>
<name>fs.azure.selfthrottling.write.factor</name>
<value>1.000000</value>
<property>
Our HA name in the original cluster were:
<!-- <property>
<name>fs.defaultFS</name>
<value>hdfs://namenodeha</value>
</property> -->
hdfs-site.xml we didn't touch at all.
We are not sure about name node settings. Two name node from original settings are probably overkill since underlying BLOB should handle all replication etc.
Could someone please clarify?