Apache Flink using with Hadoop 2.8.0 for S3A Path Style Access

Question

I am trying to use S3 backend with custom endpoint. However, it is not supported in hadoop-aws@2.7.3, I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers:

Because fs.s3a.path.style.acces" is not recognized in old version. I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:

According to, I need hadoop-aws@2.7.2 and https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#provide-s3-filesystem-dependency

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
    at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)

Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.

Just for the flink with hadoop 2.8.0 maven artifacts part, they will be available for 1.4.0 onwards. See issues.apache.org/jira/browse/FLINK-6466 — Dawid Wysakowicz
So, If I use flink-1.4 from snapshot in my dependencies, would that work? — Mustafa

stevel stevel · Accepted Answer · 2017-07-11T10:44:28

Don't try and mix Hadoop JARs, it won't work and all support JIRAs will be rejected.
In maven you could try excluding the Hadoop 2.7 dependencies from your flink import, and then explicitly pull in hadoop-client, hadoop-aws, ... I don't have the flink setup, but here is one for Spark designed to let me mix in Hadoop 3.0 beta builds with Spark 2.2, excluding from Spark the hadoop stuff, and from hadoop all the jackson and jetty bits. Yes, it hurts, but that's the only way I've been able to completely control what I end up with.
No idea about flink-snapshot, it'll depend on what it was built with. Ask on the mailing list

Apache Flink using with Hadoop 2.8.0 for S3A Path Style Access

1 Answers