Does anybody know how to update file resources when running Hive queries through the Hive interface in Hue (Beeswax)? I'm running CDH5.3
Specifically: I am running a query with a "transform" statement and a python script. The python file is on HDFS and was added as a "file resource". The query executes fine and gives the correct result. If I update the python file and overwrite it on HDFS, then go run the query again, the new file is not updated in the hadoop distributed queue (i.e. the query still uses the "old" file and gives the old result). The only thing that "works" is giving the new python file a different name, which is obviously a terrible workaround.
Further, I've read the Hive doc in detail and see that from the Hive shell you can "ADD", "LIST" and "REMOVE" files from the distributed queue. From the Hive shell I am able to add files, but the LIST command (specifically "LIST FILES") gives a ParseException like it doesn't know the command LIST. The REMOVE statement seems to always return successfully even if I give it a garbage file name, so without the LIST command I can't be sure that it's removing anything.
UPDATE: from the Beeline shell directly, it behaves as I would expect. i.e.
- ADD FILE /path/to/myfile.py
- SELECT TRANSFORM (t.a, t.b, t.c) USING 'python myfile.py' AS (d, e, f) FROM (SELECT a, b ,c FROM my_table) as t
- --make updates to myfile.py--
- ADD FILE /path/to/myfile.py
- re-run (2), successfully get results using new file
This behavior does not work in Hue / Beeswax.