0
votes

I am new to hadoop and trying to understand why my oozie shell action is not taking the new ticket even after doing kinit. here is my scenario.

  1. I login using my ID "A", and have a kerberos ticket for my id. I submit oozie worklow with shell action using my ID. Inside oozie shell action I do another kinit to obtain the ticket for ID "B". Only this id "B" has access to some HDFS file. kinit is working fine since klist showed the ticket for id "B". Now when I read the HDFS file that only B has access to, I get permission denied error saying "A" does not have permission to access this file. But when I do the same thing from linux cli, outside oozie, after I do kinit and take ticket for "B", I am able to read the HDFS file as "B". But the same step is not working inside oozie shell action and hadoop fs commands always seem to work as the user that submitted the oozie workflow rather than the user for which kerberos ticket is present. Can someone please explain why this is happening? I am not able to understand this.

  2. In the same shell action, though hadoop fs command failed to change to user "B", hbase shell works as user B. Just for testing, I created a hbase table that only "A" has access to. I added the hbase shell to perform get command on this table. If I do kinit -kt for user "B" and get its ticket, this failed too, saying "B" does not have access to this table. So I think hbase is taking the new ticket instead of the delegation token of the user submitting the oozie workflow. When I dont do kinit -kt inside the shell action, hbase command succeeds. If I do kinit, I could not even run hive queries saying "A" does not have execute access to some directories like /tmp/B/ that only "B" has access to, so I could not understand how hive is working, if it is taking the delegation token that is created when oozie workflow is submitted or if it is taking the new ticket created for new user.
    Can someone please help me understand the above scenario? Which hadoop services takes new ticket for authentication and which commands take the delegation token (like hadoop fs commands)? Is this how it would work or am I doing something wrong? I just dont understand why the same hadoop fs command worked from outside oozie as different user but not working inside oozie shell action even after kinit.

  3. When is this delegation token actually get created? Does it get created only when oozie worklow is submitted or even I issue hadoop fs commands? Thank you!
2

2 Answers

2
votes

In theory -- Oozie automatically transfers the credentials of the submitter (i.e. A) to the YARN containers running the job. You don't have to care about kinit, because, in fact, it's too late for that.
You are not supposed to impersonate another user inside of an Oozie job, that would defeat the purpose of strict Kerberos authentication

In practice it's more tricky -- (1) the core Hadoop services (HDFS, YARN) check the Kerberos token just once, then they create a "delegation token" that is shared among all nodes and all services.

(2) the oozie service user has special privileges, it can do a kind of Hadoop "sudo" so that it connects to YARN as oozie but YARN creates the "delegation token" for the job submitter (i.e. A) and that's it, you can't alter change that token.

(3) well, actually you can use an alternate token, but only with some custom Java code that explicitly creates a UserGroupInformation object for an alternate user. The Hadoop command-line interfaces don't do that.

(4) what about non-core Hadoop, i.e. HBase or Hive Metastore, or non-Hadoop stuff, i.e. Zookeeper? They don't use "delegation tokens" at all. Either you manage explicitly the UserGroupInformation in Java code, or the default Kerberos token is used at connect time.
That's why your HBase shell worked, and if you had used Beeline (the JDBC thin client) instead of Hive (the legacy fat client) it would probably have worked too.

(5) Oozie tries to fill that gap with specific <credentials> options for Hive, Beeline ("Hive2" action), HBase, etc; I'm not sure how that works but it has to imply a non-default Kerberos ticket cache, local to your job containers.

0
votes

We've found it possible to become another kerb principal once an oozie workflow is launched. We have to run a shell action that then runs java with a custom -Djava.security.auth.login.config=custom_jaas.conf that will then provide a jvm kinit'ed as someone else. This is along the lines of Samson's (3), although this kinit can be even a completely different realm.