3
votes

I'm a Java user, when submitting a job to Hadoop mapReduce, it uses Kerberos to authenticate for Hadoop, and upon success there's the delegation token created and passed with the job submission to Hadoop instead of the kerberos ticket (for security reason as stated by Hadoop). Now the job is running as me, but the job itself needs to use Kerberos to send request to other services outside Hadoop. Now I don't have kerberos TGT on Hadoop and I can't get the service ticket.

Is there anyway I can pass the Kerberos ticket with the job? (I know it might be dangerous since we don't want to pass the secret around), JobConf could pass the string to string pairs to Hadoop, but I have to convert the TGT to a json string and revert it during job running?

Or is it possible to use the delegation token reform TGT?

I tried to google it but not much information, anyone could help? Thank you.

**Editted:**

Looks like there's no easy way of doing this without passing the TGT to Hadoop, so I am going to try the following method by passing the TGT as string via job config map to Hadoop (String only), and convert the string back to TGT object when the job runs in Hadoop. The concern is I am going to pass the credentials over the network, which is not a best practice and one of the very reasons Hadoop didn't pass Kerberos around for security. If I could re-use the reformed TGT passed to Hadoop to get the service tickets, I will try to encrypt the TGT string as much as possible to avoid security issues.

So before starting a job in the local machine, the code would be like:

import sun.security.krb5.Credentials;  

Credentials tgt = Credentials.acquireTGTFromCache(null, null); // Make sure kinit is done before this

String tgtStr = tgt.convertToJsonString(); //Need to implement this

Job job = new Job("Test");
JobConf jobConf = job.getJobConf();
jobConf.set("tgtStr", tgtStr);
job.addTask(Test.class, "run", null);
job.submit();
job.waitForCompletion(true);

Then the function in the job for Hadoop to run would be like:

Configuration conf = TaskContext.get().getConfiguration();
String tgtStr = conf.get("tgtStr");
Credentials tgt = reformTGTFromString(tgtStr);//Need to implement this
Credentials serviceTicket = Credentials.acquireServiceCreds(servicePrincipal, tgt); //This is to get any service ticket

So I need to implement two function to stream TGT object (Credentials.class) to string and then reform it back to object.

Anyone knows a better solution for this? Thanks.

3
By disassembling the Credentials fields and convert them to Strings using Base64 encoder, form a JSON string and pass it to Hadoop using config map or distributed cache suggested by RVM, and then reform the Credentials object in the job running on Hadoop, I can get back the Kerberos TGT and successfully get any service tickets using it. So this method works, and the only thing here needs to be very cautious is the encryption of the keys that are passed over network.Drawyan
I know this is quite old, but with Java 8 now supporting Constrained Delegation (S4U); it may be possible to pass just a service ticket to Hadoop and then with the appropriately configured account request tickets to specific services. Java 8 - S4UTheNorthWes

3 Answers

2
votes

Please see the design at http://carfield.com.hk:8080/document/distributed/hadoop-security-design.pdf , if you have not done so already.

Or is it possible to use the delegation token reform TGT? No, the delegation tokens are issued by Hadoop name node and while it is based on the Kerberos authentication, it is independent and you can not derive the Kerberos TGT from it.

In the original design, we considered using solely Kerberos(without any additional tokens), which would have made your plan easy but decided against it for these reasons:

Performance:

  • Thousands of M/R tasks may need to get the Kerberos tickets at the same time
  • Kerberos credentials need to be renewed before the expiry For scheduled jobs, this will be an issue
  • Delegation tokens don’t depend on Kerberos and can be coupled with non-Kerberos authentication mechanisms- (such as SSL) used at the edge.

In your case, you can use a private distributed cache and send the forwardable TGT. I think this will be OK but need to think about it some more. Obviously you need to make sure your implementation is secure, that your tickets have minimally necessary lifetime, IP channel bindings are used if possible and restrict the use of tickets only to authorized processes.

2
votes

By disassembling the Credentials fields and convert them to Strings using Base64 encoder, form a JSON string and pass it to Hadoop using config map or distributed cache suggested by RVM, and then reform the Credentials object in the job running on Hadoop, I can get back the Kerberos TGT and successfully get any service tickets using it. So this method works, and the only thing here needs to be very cautious is the encryption of the keys that are passed over network.

1
votes

First of all, your account has to have delegation enabled. The service ticket has to request a forwardable ticket. If that is all true, Hadoop has to retrieve the delegated credential from the GSSContext and construct a new one on behalf of you. With that new TGT it will be able to perform further steps. Use Wireshark to check the ticket for hadoop.