0
votes

I have a scenario with a flink application that receives data streams in the following format:

{ "event_id": "c1s2s34", "event_create_timestamp": "2019-03-07 11:11:23", "amount": "104.67" }

I am using the following tumbling window to find the sum, count, and average amounts for input streams in the last 60 seconds.

keyValue.timeWindow(Time.seconds(60))

However how can I label the aggregated outcome such that I can say that the output data stream between 16:20 and 16:21 the aggregated results are sum x, count y, and average z.

Any help is appropriated.

1
How do you want to consume the results -- are you going to print them, or write them to a file, or send them to Kafka, ... ?David Anderson
Hi David, I want to send the result to Kinesis Firehose.observer0107

1 Answers

1
votes

If you look at the windowing example in the Flink training site -- https://training.ververica.com/exercises/hourlyTips.html -- you'll see an example of how to use a ProcessWindowFunction to create output events from windows that include the timing information, etc. The basic idea is that the process() method on a ProcessWindowFunction is passed a Context which in turn contains the Window object, from which you can determine the start and ending times for the window, e.g, context.window().getEnd().

You can then arrange for your ProcessWindowFunction to return Tuples or POJOs that contain all of the information you want to include in your reports.