0
votes

I am using flink 1.5.2 to solve a CEP problem.

my data is from a List, some other process will add new Event object to that list while system is running. It is not socket or network message. I've been reading the official site example. Here are the steps I imagine I should be doing.

  1. create an DataStream using env.fromCollection(list);
  2. define a Pattern pattern
  3. get a PatternStream using CEP.pattern(data_stream, pattern)
  4. use pattern_stream.select( ...implement select interface ...) to get the complex event result as a DataStream

But my input stream should be unbounded. I didn't find any add() method in DataStream<> object. How do I accomplish this? and also, do I need to tell DataStream<> when to clean up obsolete events?

1

1 Answers

1
votes

Collections are only suitable as an input source for Flink when working with a bounded input set that's fixed up front, as when writing a test or just experimenting. If you want an unbounded stream you will need to choose a different source, such a socket or a message queuing system like Kafka.

Sockets are easy to work with for experimentation. On Linux and MacOS systems you can use

nc -lk 9999

to create a socket that Flink can bind to on port 9999, and whatever you provide as input to nc (netcat) will be streamed into your Flink job one line at a time. Netcat is also available for Windows, but isn't pre-installed.

However, you shouldn't plan to use sockets in production, as they can't be rewound (which is crucial for achieving accurate results with Flink during failure recovery).