DataSet to Kafka Using Flink? Is it possible

Question

I have a usecase where i need to move records from hive to kafka. I couldn't find a way where i can directly add a kafka sink to flink dataset. Hence i used a workaround where i call the map transformation on the flink dataset and inside the map function i use the kafkaProducer.send() command for the given record.

The problem i am facing is that i don't have any way to execute kafkaProducer.flush() on every worker node, hence the number of records written in kafka is always slightly lesser than the number of records in the dataset.

Is there an elegant way to handle this? Any way i can add a kafka sink to dataset in flink? Or a way to call kafkaProducer.flush() as a finalizer?

Kafka sinks == github.com/apache/flink/tree/master/flink-connectors/… — OneCricketeer
This is totally not helpful, as OP asked for DataSet API and Kafka Sinks are only available for streaming... — Dominik Wosiński

Dominik Wosiński Dominik Wosiński · Accepted Answer · 2019-11-12T08:19:37

You could simply create a Sink that will use KafkaProducer under the hood and will write data to Kafka.

DataSet to Kafka Using Flink? Is it possible

1 Answers