When I use flinks streaming API to write to S3:
// Set StreamExecutionEnvironment
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
// Set checkpoints in ms
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// Add source (input stream)
DataStream<String> dataStream = StreamUtil.getDataStream(env, params);
// Sink to S3 Bucket
dataStream.writeAsText("s3a://test-flink/test.txt").setParallelism(1);
I get the following error:
Unable to load AWS credentials from any provider in the chain
My configuration is:
# flink --version
Version: 1.3.1, Commit ID: 1ca6e5b
The Hadoop config directory was added to flink-conf.yaml
# cat flink/config/flink-conf.yaml | head -n1
fs.hdfs.hadoopconf: /root/hadoop-config
The rest of the content of flink-conf.yaml is identical to the release version.
The following was added to /root/hadoop-config/core-site.xml
# cat /root/hadoop-config/core-site.xml
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.buffer.dir</name>
<value>/tmp</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>MY_ACCESS_KEY</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>MY_SECRET_KEY</value>
</property>
</configuration>
The JAR’s aws-java-sdk-1.7.4.jar, hadoop-aws-2.7.4.jar, httpclient-4.2.5.jar, httpcore-4.2.5.jar where added to flink/lib/ from http://apache.mirror.anlx.net/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
# ls flink/lib/
aws-java-sdk-1.7.4.jar
flink-dist_2.11-1.3.1.jar
flink-python_2.11-1.3.1.jar
flink-shaded-hadoop2-uber-1.3.1.jar
hadoop-aws-2.7.4.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.7.jar
Note the aws-java-sdk-1.7.4.jar is 1.7.4 and not 1.7.2 as it is in the docs here
pom.xml has the following build dependencies.
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.7.2</version>
</dependency>
My reference was the (https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#set-s3-filesystem)
I am able to write to the S3 bucket using the credentials in core-site.xml with awscli.