0
votes

Working with a simple HiveQL query that looks like this:

SELECT event_type FROM {{table}} where dt=20140103 limit 10;

The {{table}} part is just interpolated via the runner code im using via Jinja2. I'm running my query using the -e flag on the hive command line using subprocess.Popen from python.

For some reason, this setup is attempting to write into the regular /user directory in HDFS? Sudoing the command has no effect. The error produced is as follows:

Job Submission failed with exception:
org.apache.hadoop.security.AccessControlException(Permission denied:user=username, access=WRITE, inode="/user":hdfs:hadoop:drwxrwxr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)

Why would hive attempt to write to /users? Additionally, why would a select statement like this need an output location at all?

1

1 Answers

2
votes

Hive is a SQL frontend to MapReduce and so needs to compile and stage Java code for execution. It's not trying to put output there but rather the program it will execute. Depending on your version of Hadoop this is controlled by the variables:

mapreduce.jobtracker.staging.root.dir

And on YARN / Hadoop 2:

yarn.app.mapreduce.am.staging-dir

These are set in mapred-site.xml.

Your runner needs to be authenticated to the cluster and have a writable directory it can use.