0
votes

I've set up an HA Hadoop cluster that worked. But after adding Kerberos authentication datanode cannot connect to namenode.

Verified that Namenode servers starts successfully and log no error. I start all services with user 'hduser'

$ sudo netstat -tuplen
...
tcp        0      0 10.28.94.150:8019       0.0.0.0:*               LISTEN      1001       20218      1518/java         
tcp        0      0 10.28.94.150:50070      0.0.0.0:*               LISTEN      1001       20207      1447/java         
tcp        0      0 10.28.94.150:9000       0.0.0.0:*               LISTEN      1001       20235      1447/java         

Datanode

Start datanode as root, using jsvc to bind service with privileged ports (ref. Secure Datanode)

$ sudo -E sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/hadoop-2.7.3/logs//hadoop-hduser-datanode-STWHDDN01.out

Got the error that datanode cannot connect to namenodes:

...
2018-01-08 09:25:40,051 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hduser
2018-01-08 09:25:40,052 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
2018-01-08 09:25:40,114 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2018-01-08 09:25:40,125 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2018-01-08 09:25:40,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2018-01-08 09:25:40,219 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: ha-cluster
2018-01-08 09:25:41,189 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: ha-cluster
2018-01-08 09:25:41,226 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-01-08 09:25:41,227 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2018-01-08 09:25:42,297 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: STWHDRM02/10.28.94.151:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-01-08 09:25:42,300 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: STWHDRM01/10.28.94.150:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)    


datanode hdfs-site.xml (excerpt):

<property>
  <name>dfs.block.access.token.enable</name>
  <value>true</value>
</property>
<property>
  <name>dfs.datanode.keytab.file</name>
  <value>/opt/hadoop/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.datanode.kerberos.principal</name>
  <value>hduser/_HOST@FDATA.COM</value>
</property>
<property>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:1004</value>
</property>
<property>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:1006</value>
</property>
<property>
    <name>dfs.datanode.data.dir.perm</name>
    <value>700</value>
</property>


I have set HADOOP_SECURE_DN_USER=hduser and JSVC_HOME in hadoop-env.sh


hdfs.keytab on datanode:

$ klist -ke etc/hadoop/hdfs.keytab                                                             Keytab name: FILE:etc/hadoop/hdfs.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   1 hduser/stwhddn01@FDATA.COM (aes256-cts-hmac-sha1-96)
   1 hduser/stwhddn01@FDATA.COM (aes128-cts-hmac-sha1-96)
   1 hduser/stwhddn01@FDATA.COM (des3-cbc-sha1)
   1 hduser/stwhddn01@FDATA.COM (arcfour-hmac)
   1 hduser/stwhddn01@FDATA.COM (des-hmac-sha1)
   1 hduser/stwhddn01@FDATA.COM (des-cbc-md5)
   1 HTTP/stwhddn01@FDATA.COM (aes256-cts-hmac-sha1-96)
   1 HTTP/stwhddn01@FDATA.COM (aes128-cts-hmac-sha1-96)
   1 HTTP/stwhddn01@FDATA.COM (des3-cbc-sha1)
   1 HTTP/stwhddn01@FDATA.COM (arcfour-hmac)
   1 HTTP/stwhddn01@FDATA.COM (des-hmac-sha1)
   1 HTTP/stwhddn01@FDATA.COM (des-cbc-md5)  

OS: Centos 7
Hadoop: 2.7.3
Kerberos: MIT 1.5.1

I guest when running datanode as user root it does not authenticate with kerberos.

Any ideas?

1

1 Answers

1
votes

I found the problem. Need to change /etc/hosts to map 127.0.0.1 to localhost only.

Before

127.0.0.1 STWHDDD01
127.0.0.1 localhost
...

After

127.0.0.1 localhost
...

I still wonder why the old mapping worked in the context of no Kerberos authentication.