1
votes

I have a flink streaming job which reads from Kafka and writes into appropriate partitions in file system. For instance, the job is configured to use a bucketing sink which writes to /data/date=${date}/hour=${hour}.

How to detect that the partition is ready to be used so that a corresponding airflow pipeline can do some batch processing on top of that hour?

1
This looks like a variation of stackoverflow.com/questions/54094729/…, yes?kkrugler
No, the last question assumes a certain way of doing it, while this asks more about what would be the right way to do it.Achyuth Samudrala

1 Answers

1
votes

You could look at the implementation of the ContinuousFileMonitoringSource, to see how it monitors the file system. And then do something similar to what David Anderson suggested in your other question, re creating a custom ProcessFunction.