Periodically refreshing static data in Apache Flink?

Question

I have an application that receives much of its input from a stream, but some of its data comes from both a RDBMS and also a series of static files.

The stream will continuously emit events so the flink job will never end, but how do you periodically refresh the RDBMS data and the static file to capture any updates to those sources?

I am currently using the JDBCInputFormat to read data from the database.

Below is a rough schematic of what I am trying to do:

I don't really have a CDC process to pick the data up and it isn't a massive amount of information. I am trying to write it in a map function with a local cache to see how that works. — mransley

Chris Gerken Chris Gerken · Accepted Answer · 2019-11-15T17:38:35

For each of your two sources that might change (RDBMS and files), create a Flink source that uses a broadcast stream to send updates to the Flink operators that are processing the data from Kafka. Broadcast streams send each Object to each task/instance of the receiving operator.

Periodically refreshing static data in Apache Flink?

2 Answers