Regarding flink stream sink to hdfs

Question

I am writing a flink code in which I am reading a file from local system and writing it to database using "writeUsingOutputFormat".

Now my requirement is to write to hdfs instead of database.

Could you please help me how can i do in flink.

Note : hdfs is up and running on my local machine.

shriyog shriyog · Accepted Answer · 2019-01-07T11:57:20

Flink provides HDFS connector which can be used to write data to any file system supported by Hadoop Filesystem.

The provided sink is a Bucketing sink which partitions the data stream into folders containing rolling files. The bucketing behavior, as well as the writing, can be configured with parameters such as batch size and batch roll over time interval

The Flink document gives following example -

DataStream<Tuple2<IntWritable,Text>> input = ...;

BucketingSink<String> sink = new BucketingSink<String>("/base/path");
sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm", ZoneId.of("America/Los_Angeles")));
sink.setWriter(new SequenceFileWriter<IntWritable, Text>());
sink.setBatchSize(1024 * 1024 * 400); // this is 400 MB,
sink.setBatchRolloverInterval(20 * 60 * 1000); // this is 20 mins

input.addSink(sink);

Regarding flink stream sink to hdfs

2 Answers