I have a problem trying to sink a file into Azure Datalake Gen 2 with the StreamingFileSink from Flink, I'm using core-site.xml with Hadoop Bulk Format I'm trying to copy to my datalake with abfss:// format (also try with abfs://)
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:61) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$BulkFormatBuilder.createBuckets(StreamingFileSink.java:371) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
I read in the official documentation and dive into Library and the problems is here: https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableWriter.java#L60
public HadoopRecoverableWriter(org.apache.hadoop.fs.FileSystem fs) {
this.fs = checkNotNull(fs);
// This writer is only supported on a subset of file systems
if (!"hdfs".equalsIgnoreCase(fs.getScheme())) {
throw new UnsupportedOperationException(
"Recoverable writers on Hadoop are only supported for HDFS");
}
This is my core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.azure.account.auth.type.ADLS_ACCOUNT_NAME.dfs.core.windows.net</name>
<value>SharedKey</value>
<description>
It is inferred by the url
</description>
</property>
<property>
<name>fs.azure.account.key.ADLS_ACCOUNT_NAME.dfs.core.windows.net</name>
<value>ADLS_KEY</value>
<description>
</description>
</property>
<property>
<name>fs.azure.createRemoteFileSystemDuringInitialization</name>
<value>true</value>
</property>
<property>
<name>fs.azure.always.use.https</name>
<value>true</value>
</property>
</configuration>
Anyone have pass this problem or is a problem with the extention abfss/abfs.
truncate(Path f, long newLength)
to be implemented -which only HDFS has done. Once it's in abfs then the hadoop team can talk to the flink folk about probing for this more elegantly, now there's an API to ask if an FS instance supports a specific feature – stevel