I currently have an issue adding a folders contents to Hives distrusted cache. I can successfully add multiple files to the distributed cache in Hive using:
ADD FILE /folder/file1.ext;
ADD FILE /folder/file2.ext;
ADD FILE /folder/file3.ext;
etc.
.
I also see that there is a ADD FILES (plural) option which in my mind means you could specify a directory like: ADD FILES /folder/; and everything in the folder gets included (this works with Hadoop Streaming -files option). But this does not work with Hive. Right now I have to explicitly add each file.
Am I doing this wrong? Is there a way to had a whole folders contents to the distributed cache.
P.S. I tried wild cards ADD FILE /folder/* and ADD FILES /folder/* but that fails too.
Edit:
As of hive 0.11 this now supported so:
ADD FILE /folder
now works.
What I am using is passing the folder location to the hive script as a param so:
$ hive -f my-query.hql -hiveconf folder=/folder
and in the my-query.hql file:
ADD FILE ${hiveconf:folder}
Nice and tidy now!