0
votes

so I'm simulating a streaming task using Flink DataStream and I want to execute an SQL query on each window.

Let's say this is the query

SELECT name, age, sum(days), avg(salary)
FROM employees
WHERE age > 25
GROUP BY name, age
ORDER BY name, age

I'm having a hard time to translate it to Flink. From my understanding, to calculate average I need to do it manually using .apply() and WindowFunction. But how do I calculate the sum then? Also manually in the same WindowFunction?
I'm also wondering if it is possible to do order by on the whole window?

Below is the pseudocode of what I thought of so far. Any help would be appreciated! Thanks!

employeesStream
.filter(new FilterFunction() ....)    \\ where clause
.keyby(nameIndex, ageIndex)           \\ group by??
.timeWindow(Time.seconds(10), Time.seconds(1))
.apply(new WindowFunction() ....)     \\ calculate average (and sum?)
// order by??

I checked the Table API but it seems for streaming not a lot of operations are supported, e.g orderBy.

1

1 Answers

0
votes

Ordering in streaming is not trivial. How do you want to sort something that is never ending? In your example you want to calculate an average or a sum, which is just one value per window. You cannot sort one value.

Another possibility is to buffer all values and wait for an indicator of completeness to start sorting. Thanks to event-time and watermarks, it is possible to sort a stream if you know that you have seen all values until a certain time (aka watermarks).

Event-time sort has been introduced recently and will be part of Flink 1.4 Table API. See here for an example.