I've followed various published documentation on integrating Apache Hive 2.1.1 with AWS S3 using the s3a:// scheme, configuring fs.s3a.access.key and
fs.s3a.secret.key for hadoop/etc/hadoop/core-site.xml and hive/conf/hive-site.xml.
I am at the point where I am able to get hdfs dfs -ls s3a://[bucket-name]/ to work properly (it returns s3 ls of that bucket). So I know my creds, bucket access, and overall Hadoop setup is valid.
hdfs dfs -ls s3a://[bucket-name]/
drwxrwxrwx - hdfs hdfs 0 2017-06-27 22:43 s3a://[bucket-name]/files
...etc.
hdfs dfs -ls s3a://[bucket-name]/files
drwxrwxrwx - hdfs hdfs 0 2017-06-27 22:43 s3a://[bucket-name]/files/my-csv.csv
However, when I attempt to access the same s3 resources from hive, e.g. run any CREATE SCHEMA or CREATE EXTERNAL TABLE statements using LOCATION 's3a://[bucket-name]/files/', it fails.
for example:
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://[bucket-name]/files/';
I keep getting this error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus on s3a://[bucket-name]/files: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: C9CF3F9C50EF08D1), S3 Extended Request ID: T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
This makes no sense. I have access to the bucket as one can see in the hdfs test. And I've added the proper creds to hive-site.xml.
NOTE: Using the same creds, I have this working for 's3n://' and 's3a://'. It just fails for 's3a://'.
Anyone have any idea what's missing from this equation?