I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ?
0
votes
1 Answers
0
votes
Flink's DataStream API and associated libraries, including the CEP library, can be used on bounded, historic (batch) datasets or with unbounded, live streams -- it makes no difference. Just setup a file (or directory) as the data source and use CEP normally. For correct, reproducible results, you should work in event time (assuming time plays a role in your processing). This is important, because CEP wants to sort your input stream(s) according to event time -- notions of before and after should be relative to when the events occured, not when they were processed.
A bit of googling will lead you to some CEP examples. There's a simple example (in Java and Scala) in the Flink training (github).