0
votes

On our cluster we had oozie setup, and it worked properly. Now we have added Kerberos and Ranger, and run into the following problem:

Oozie starts its workflow on schedule (as shown in yarn), but the actual hive action does not appear in yarn.

Details:

  • I do NOT see an error message, the workflow has been running for over 24 hours already
  • After a long time (hours) the job log of the oozie workflow job shows:
  • The hive action that I try to do, is a simple one line insert.
  • I have been able to do hive and pig actions outside oozie properly
  • I have done a kinit, and I have updated the oozie workflow to include hcat credentials
  • I checked whether the job was waiting, this was not the case, in yarn I also did not see the job under new, new saving or accepted What have I tried:

  • Starting the oozie job as the hive user (which I have given rights to all tables and hdfs files in ranger) did not make a difference

Update

Finally found a clue in the krb5 log, still looking for a way to proceed:

2016-07-19 18:53:46,157 INFO  [pool-5-thread-53]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 200: get_databases: NonExistentDatabaseUsedForHealthCheck
2016-07-19 18:53:46,157 INFO  [pool-5-thread-53]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=oozie/myactualservername@MYACTUALDOMAINNAME ip=/someipaddress   cmd=get_databases: NonExistentDatabaseUsedForHealthCheck    
2016-07-19 18:53:46,158 INFO  [pool-5-thread-53]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 200: get_delegation_token
2016-07-19 18:53:46,158 INFO  [pool-5-thread-53]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=oozie/myactualservername@MYACTUALDOMAINNAME ip=/someipaddress   cmd=get_delegation_token    
2016-07-19 18:53:46,159 INFO  [pool-5-thread-53]: delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:createPassword(385)) - Creating password for identifier: owner=u_batch, renewer=oozie, realUser=oozie/myactualservername@MYACTUALDOMAINNAME, issueDate=1468947226159, maxDate=1469552026159, sequenceNumber=15, masterKeyId=14, currentKey: 14
2016-07-19 18:53:46,160 INFO  [pool-5-thread-53]: thrift.ZooKeeperTokenStore (ZooKeeperTokenStore.java:addToken(385)) - Added token: /hive/cluster/delegation/METASTORE/tokens/lotsofcharacterswerehere
2016-07-19 18:53:59,222 ERROR [pool-5-thread-198]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128
    at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:360)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
    at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
    at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
    at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
    at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
    at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
    at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
    ... 10 more
1
Did you check the oozie log in oozie.log or hourly log files? In the secure cluster, oozie also requires some conf changes. - YoungHobbit
The oozie config has been changed to facilitate kerberos. The oozie logs don't show any errors or warnings, the last thing in the yarn log of the oozie launcher is that it is trying to connect to the metastore. - Dennis Jaheruddin

1 Answers

0
votes

Summary

Why does it work without oozie, but not via oozie?

Because the config provided via oozie is wrong or insufficient.

Why is there no error, but does the process hang indefinitely?

Because you try to connect to a secure metastore in an insecure way. (In my opinion this should generate an error, but clearly it does not always do that. Note that you do see the error in the krb5.log if you would happen to look there)


Full Answer

The clue that was mentioned in the update led me to understand that the connection to the metastore was not made succesfull.

Trying to connect to a secure cluster using an insecure method can lead to your attempt hanging indefinitely.

As hive queries work without oozie (and checking that proper security settings were configured in general) I realized that the problem must come from the configuration that was passed by oozie.

After comparing to a reference hive-site.xml (the one that should be referenced to in the oozie workflows) I found that the following property changes helped me to get things working:

<name>hive.metastore.sasl.enabled</name>
<value>true</value>

Set above from false to true

<name>hive.metastore.kerberos.principal</name> 
<value>hive/_HOST@putyourdomainnamehere</value>

Added above (note that you should leave _HOST it will be substituted automatically)

<name>hive.server2.authentication.kerberos.principal</name>
<value>hive/_HOST@putyourdomainnamehere</value>

Added above