7
votes

ISSUE:

Able to successfully download the file using AWS CLI as well as boto 3. However, while using the S3A connector of Hadoop/Spark , receiving the below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o24.parquet.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: BCFFD14CB2939D68, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: MfT8J6ZPlJccgHBXX+tX1fpX47V7dWCP3Dq+W9+IBUfUhsD4Nx+DcyqsbgbKsPn8NZzjc2U

Configuration: Running this on my Local machine

  1. Spark Version 2.4.4

  2. Hadoop Version 2.7

Jars added:

  1. hadoop-aws-2.7.3.jar

  2. aws-java-sdk-1.7.4.jar

Hadoop Config:

hadoop_conf.set("fs.s3a.access.key", access_key)
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.secret.key", secret_key)
hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.session.token", session_key)
hadoop_conf.set("fs.s3a.endpoint", "s3-us-west-2.amazonaws.com") # yes, I am using central eu server.
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")

Code to Read the file:

from pyspark import SparkConf, SparkContext, SQLContext
sc = SparkContext.getOrCreate()
hadoop_conf=sc._jsc.hadoopConfiguration()
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet(path)
print(df.head())
1
I've been facing this too; I accidentally stumbled on; hadoop.apache.org/docs/r2.8.5 while trying to fix this. So I decided to install a non hadoop bundle of spark and install a better versioned hadoop (2.4.5 and 3.2.1). But still get exactly the same issue as you, albeit a more elaborate error message (non null AWS Error Code) Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 41...; S3 Extended Request ID: JCAO0...), S3 Extended Request ID: JCAO0...Florian Suess

1 Answers

1
votes

Set AWS credentials provider to profile credentials:

hadoopConf.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider")