2
votes

Using Apache Flink I want to create a streaming window sorted by the timestamp that is stored in the Kafka event. According to the following article this is not implemented.

https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams

However, the article is dated july 2015, it is now almost a year later. Is this functionality implemented and can somebody point me to any relevent documentation and/or an example.

2

2 Answers

2
votes

Apache Flink supports stream windows based on event timestamps. In Flink, this concept is called event-time.

In order to support event-time, you have to extract a timestamp (long value) from each event. In addition, you need to support so-called watermarks which are needed to deal with events with out-of-order timestamps.

Given a stream with extracted timestamps you can define a windowed sum as follows:

val stream: DataStream[(String, Int)] = ...
val windowCnt = stream
  .keyBy(0) // partition stream on first field (String)
  .timeWindow(Time.minutes(1)) // window in extracted timestamp by 1 minute
  .sum(1) // sum the second field (Int)

Event-time and windows are explained in detail in the documentation (here and here) and in several blog posts (here, here, here, and here).

0
votes

Sorting by timestamps is still not supported out-of-box but you can do windowing based on the timestamps in elements. We call this event-time windowing. Please have a look here: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html.