The error is a Py4JJavaError: An error occurred while calling o411.csv.
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: fsdfewffsd, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID
I am on spark 3.0 preview version.
I started the pyspark session with pyspark --packages=org.apache.hadoop:hadoop-aws:2.7.3 command.
I have tried the following code below
hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")
hadoop_conf.set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com") hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.access.key",ACCESS_KEY)
hadoop_conf.set("fs.s3a.secret.key",SECRET_KEY)
Followed by a call to the bucket as such, the following line throws the error.
sdf = spark.read.csv("s3a://aai-team/neighbourhoods.csv")