I'm working on poc project in java using kafka -> flink -> elastic search.
On kafka will be produced an unpredictable number of events from 0 up to thousands of events/sec such as on a specific topic.
{"gid":"abcd-8910-2ca4227527f9", "state":"stateA", "timestamp:1465566255, "other unusefull info":"..."}
Flink will consume this events and should sink every second into elastic search the number of events in each state ex:
{"stateA":54, "stateB":100, ... "stateJ":34}
I have 10 states: [Created, ... , Deleted]
with an average life cycle of 15 minutes. The state can change twice a second. Theoretically new states could be added.
In order to sink streams every second I'm thinking to use flink's time windows https://flink.apache.org/news/2015/12/04/Introducing-windows.html
The problem is that I need stateful objects with info about guid->previous-state
and stateX->count
in order to be able to increment/decrement the count when new event occurs.
I found a draft document about stateful steam processing https://cwiki.apache.org/confluence/display/FLINK/Stateful+Stream+Processing
I'm new to flink and stream processing, I didn't dig into flink stateful stream processing yet. For the first phase I'm thinking to use static objects for this, but this approach won't work when several flink instances will be launch.
I want to ask you:
- What do you think about this approach ?
- Is flink suited for this kind of stream processing ?
- What will be your approach for solving this problem ?
Also I'd appreciate some code snippets for the windowed stateful stream solution (or other solutions).
Thanks,