I have a problem with accessing data from S3 from Spark.
I have spylon-kernelinstalled for JupyterHub (which is Scala kernel with Spark framework integrtation). It uses pyspark.
Unfortunately the newest pyspark still uses hadoop-2.7.3 libraries. When I'm trying to access S3 bucket in Frankfurt region I get following Java exception:
"com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: xxxxxxxxxx, AWS Error Code: null, AWS Error Message: Bad Request"
From my research it looks like it's hadoop 2.7.3 problem. With newer versions (3.1.1) it works well locally but pyspark uses those hadoop 2.7.3 jars and looks like it can't be changed. Can I do something about it? Maybe there is some way to tell pyspark to use hadoop 3.1.1 jars? Or maybe there is other Scala kernel with Spark for Jupyterhub which uses spark-shell instead of pyspark?