Issue creating/accessing hive external table with s3 location from spark thrift service

Question

I have configured the s3 keys (access key and secret key) in a jceks file using hadoop-credential api. Commands used for the same are as below:

hadoop credential create fs.s3a.access.key -provider jceks://hdfs@nn_hostname/tmp/s3creds_test.jceks

hadoop credential create fs.s3a.secret.key -provider jceks://hdfs@nn_hostname/tmp/s3creds_test.jceks

Then, I am opening a connection to Spark Thrift Server using beeline and passing the jceks file path in the connection string as below:

beeline -u "jdbc:hive2://hostname:10001/;principal=hive/_HOST@?hadoop.security.credential.provider.path=jceks://hdfs@nn_hostname/tmp/s3creds_test.jceks;

Now, when I try to create an external table with the location in s3, it fails with the below exception:

CREATE EXTERNAL TABLE IF NOT EXISTS test_table_on_s3 (col1 String, col2 String) row format delimited fields terminated by ',' LOCATION 's3a://bucket_name/kalmesh/';

Exception: Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://bucket_name/kalmesh: getFileStatus on s3a://bucket_name/kalmesh: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: request_id), S3 Extended Request ID: extended_request_id=) (state=,code=0)

stevel stevel · Accepted Answer · 2017-05-19T10:55:01

I don't think jceks support for the fs.s3a. secrets went in until Hadoop 2.8. I don't think; it's hard to tell from the source. If that is the case, and you are using Hadoop 2.7, then the secret isn't going to to be picked up. Afraid you will have to put it in the config.

Issue creating/accessing hive external table with s3 location from spark thrift service

2 Answers