Using Beeline connected to SparkSQL 1.3, I am trying to create a table that uses S3 data (using the s3a protocol):
CREATE EXTERNAL TABLE mytable (...) STORED AS PARQUET LOCATION 's3a://mybucket/mydata';
I get the following error:
Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: AmazonClientException Unable to load AWS credentials from any provider in the chain (state=,code=0)
I have the following environment variables set in spark-env.sh:
AWS_ACCESS_KEY_ID=<my_access_key>
AWS_SECRET_ACCESS_KEY=<my_secret_key>
I know it's picking up this environment because the classpath is also set here, and it pulls in the Hadoop tools lib (which has the S3 connector). However, when I show the variables in beeline it says they are undefined:
0: jdbc:hive2://localhost:10000> set env:AWS_ACCESS_KEY_ID;
+------------------------------------+
| |
+------------------------------------+
| env:AWS_ACCESS_KEY_ID=<undefined> |
+------------------------------------+
1 row selected (0.112 seconds)
0: jdbc:hive2://localhost:10000> set env:AWS_SECRET_ACCESS_KEY;
+----------------------------------------+
| |
+----------------------------------------+
| env:AWS_SECRET_ACCESS_KEY=<undefined> |
+----------------------------------------+
1 row selected (0.009 seconds)
Setting fs.s3a.access.key and fs.s3a.secret.key also fails to have any effect:
0: jdbc:hive2://localhost:10000> set fs.s3a.access.key=<my_access_key>;
0: jdbc:hive2://localhost:10000> set fs.s3a.secret.key=<my_secret_key>;
Is there somewhere else I need to set this environment?
FWIW, I can successfully use hadoop fs -ls s3a://mybucket/mydata to list the files.
UPDATE:
I added the following to hive-site.xml:
<property>
<name>fs.s3a.access.key</name>
<value>my_access_key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>my_secret_key</value>
</property>
I can now create the table without error, but any attempt to query it results in this error:
Error: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost): com.amazonaws.AmazonClientException:
Unable to load AWS credentials from any provider in the chain