1
votes

Please help me to understand why I have problem with datanode connection below:

WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(588)) - Authentication exception: org.apache.hadoop.security.authentication.client.AuthenticationException: GSS Exception: Failure unspecified at GSS-API level (Mechanism level: Checksum failed) WARN datanode.DataNode (BPServiceActor.java:retrieveNamespaceInfo(227)) - Problem connecting to server: s--t-..ru/10.243..*:8020

I have kerberized cluster and everything works fine but I need to add new datanode and I have connection problem only with new datanode. On namenode I have next messages:

INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from 10.243.218.16:33435 for protocol org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol is unauthorized for user dn/s--t-..ru@.RU (auth:PROXY) via $J4LB00-3PQ0LQ7EGVSG@.RU (auth:KERBEROS) 2020-02-05 09:37:20,172 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client 10.243.218.16 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $J4LB00-3PQ0LQ7EGVSG@.RU is not allowed to impersonate dn/s--t-..ru@.RU]

So what the most interesting things it is a User: $J4LB00-3PQ0LQ7EGVSG@.RU it the same dn/s--t-..ru user but pre-win 2000 logon name

Is it right things? What else interesting i don't have problem with other elder datanode only with this one.

3
Did you copy configs (hdfs-site.xml, core-site.xml) to the new node? Check that hadoop.proxyuser.hdfs.groups and hadoop.proxyuser.hdfs.hosts are both *? - mazaneicha
this what i checked first <property> <name>hadoop.proxyuser.hdfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property> - Ururu
maybe something with sssd or AD ? - Ururu
Check hadoop.security.auth_to_local and /L to lowercase the name? - mazaneicha
Active Directory does not allow to create directly an account for a Kerberos SPN (service principal name e.g HTTP/host@Realm -- here dn stands for DataNode) ; you must create a dummy account, then "attach" a SPN to it. And a single SPN otherwise Java fails to manage the connection later. I guess your AD admin messed up the account creation or keytab retrieval, that stuff should be automated via Cloudera Manager or Ambari. - Samson Scharfrichter

3 Answers

3
votes

We ran into exactly the same issue this week on our cluster, affecting all nodes!

In HDP 3.1.4 release notes on page 72 we found a known issue regarding Open JDK 8u242:

Description of the problem or behavior Open JDK 8u242 is not supported as it causes Kerberos failure. Workaround Use a different version of Open JDK

We figured out that we did an upgrade to 8u242 in the beginning of the week and downgraded to openjdk-8-jdk=8u162-b12-1.

That fixed the problem for us.

1
votes

Thank you for reporting this. I investigated the issue on a HDP 3.1.4 cluster for a few hours but could not grasp what was going on. Downgrading JDK fixed it.

This OpenJDK 8 242 release is causing more issues. We had to rollback to OpenJDK 8u232 for HDF (Nifi) as well for an issue with Nifi Site-to-site connections giving timeouts, not Kerberos related. Symptom was that TCP sessions to the raw port were in state SYN_RECV.

0
votes

Downgrading is not a real fix. We have been hit by the same problem so I reported it here: https://bugs.launchpad.net/ubuntu/+source/openjdk-8/+bug/1861883

I also reported this to the OpenJDK bug report site (twice) but no reaction so far.