I run Apache Spark (2.11, 1.5.2) on a local machine using input files stored in AWS S3. If the files are stored in a bucket in the Ireland region (eu-west-1) it works fine.
But if I try to read files stored in an S3 bucket located in Frankfurt (eu-central-1) it fails with the error message:
The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256
How can I use AWS4-HMAC-SHA256?
The detailed error message is:
Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2myfolder' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>ECB53FECECD1C910</RequestId><HostId>BmEyVcO/eHZR3IO2Z+8IkEWOn189IBGb2YAgbDxhTu+abuyORCEjHyC14l6nIRVNNnQL2Nyya9I=</HostId></Error>
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:174)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:214)
...
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2myfolder' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>ECB53FECECD1C910</RequestId><HostId>BmEyVcO/eHZR3IO2Z+8IkEWOn189IBGb2YAgbDxhTu+abuyORCEjHyC14l6nIRVNNnQL2Nyya9I=</HostId></Error>
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752)
The code is:
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
public class S3Problem {
public static void main(String[] args) {
String s3Folder = "s3n://mybucket/myfolder";
SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> myData = sc.textFile(s3Folder).cache();
long count = myData.count();
System.out.println("Line count: " + count);
}
}
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are provided as environment variables.