0
votes

The error is a Py4JJavaError: An error occurred while calling o411.csv.

com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: fsdfewffsd, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID

I am on spark 3.0 preview version. I started the pyspark session with pyspark --packages=org.apache.hadoop:hadoop-aws:2.7.3 command.

I have tried the following code below

hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")
hadoop_conf.set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")    hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.access.key",ACCESS_KEY)
hadoop_conf.set("fs.s3a.secret.key",SECRET_KEY)

Followed by a call to the bucket as such, the following line throws the error.

sdf = spark.read.csv("s3a://aai-team/neighbourhoods.csv")
1

1 Answers

1
votes

I had exact same issue today. Just solved it with setSystemProperty ->

spark = SparkSession.builder.appName("app").getOrCreate()
sc=spark.sparkContext
sc.setSystemProperty("com.amazonaws.services.s3.enableV4", "true")

and then the hadoop_conf just like yours.