0
votes

I am using Flink streaming to read the data from Kafka and process the data. Before consuming from Kafka when the application starts I need to read a file using a DataSet API and sort the file based on some criteria and create a list from it. Then starts to consume from Kafka in a streaming fashion. I have written a logic to read and sort the data from a file using DataSet API. But when I try to tun the program it is never executing and the Flink immediately starts consuming from Kafka. Is there any way I could process the data set first then streaming in Flink?

2

2 Answers

1
votes

No, it is not possible to mix the DataSet and DataStream APIs. You can however, start both programs from the same main() method but would have to write the sorted result of the DataSet program into a file which is consume by the DataStream program.

0
votes

Create another Flink Job for your DataSet manipulation and sink the results to the Kafka your Streaming Job is consuming from.