2
votes

I have a Kerberized CDH cluster, where there are some daily oozie workflows running. All of them use shell, impala-shell, hive and sqoop to ingest data to Hive tables (lets call these tables SensitiveTables)

Now, I want to create 2 new BI users to use the cluster and experiment with some other ingested data.

The requirement is that these new BI users:

  • should not have access to the SensitiveTables
  • should be able to spark-submit jobs to the cluster
  • (optionally) use Hue

Apart from setting-up Apache Sentry (which is the recommended way to go), is there any chance to meet those requirements using file-permissions or ACL and Service Level Authorization ?

So far, I managed (via hadoop fs -chmod o-rwx /user/hive/warehouse/sensitive) to restrict access to SensitiveTables via Hive (which uses user impersonation), but failed to do so via Impala (which submits all jobs to the cluster as user impala). Is there anything else I should try?

Thank you,

Gee

1
Sure, you are right. Thank you. - geexee
Sentry is not that difficult to set up. It's just an additional SPOF (or rather 2, since it does not support Metastore HA). On the bonus side, it will take care of the HDFS bug about ACLs that do not propagate correctly (not fixed until Hadoop 3). - Samson Scharfrichter
Looks like Cloudera did a deliberate job of not developing proper authorization in Impala, to lock people into Sentry. But they don't seem to invest into Sentry any more; they switched to shiny new toys (Kudu, Data Science Workbench) that their Sales people can boast about. And ultimately they will switch to yet another new toy and leave all their half-baked projects behind (remember Sqoop2...?) - Samson Scharfrichter

1 Answers

1
votes

After a lot of research and based on the assumptions I described, the answer is NO. Furthermore, the metastore can not be protected this way.