8
votes

I am trying to load a dataset stored on HDFS (textfile) into hive for analysis. I am using create external table as follows:

CREATE EXTERNAL table myTable(field1 STRING...) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE 
LOCATION '/user/myusername/datasetlocation';

This works fine, but it requires write access to the hdfs location. Why is that?

In general, what is the right way to load text data to which I do not have write access? is there a 'read-only' external table type?

Edit: I noticed this issue on hive regarding the question. It does not seem to have been resolved.

3
Related question : stackoverflow.com/questions/37538487/… (but no answer..)Amir
Looks like this is a known issue from back in 2009 - issues.apache.org/jira/browse/HIVE-335 Doesn't look like there is any way around itAlex Joseph

3 Answers

3
votes

Partially answering my own question:

Indeed it seems not to be resolved by hive at this moment. But here is an interesting fact: hive does not require write access to the files themselves, but only to the folder. For example, you could have a folder with permissions 777, but the files within it, which are accessed by hive, can stay read-only, e.g. 644.

1
votes

I don't have a solution to this, but as a workaround I've discovered that

CREATE TEMPORARY EXTERNAL TABLE

works without write permissions, the difference being the table (but not the underlying data) will disappear after your session.

1
votes

If you require write access to hdfs files give hadoop dfs -chmod 777 /folder name

this means your giving all access permissions to that particular file.