1
votes

Is there a way to count the number of unique words in time window stream with Flink Streaming? I see this question but I don't know how to implement time window.

1

1 Answers

0
votes

Sure, that's pretty straightforward. If you want an aggregation across all of the input records during each time window, then you'll need to use one of the flavors of windowAll(), which means you won't be using a keyedstream, and you can not operate in parallel.

You'll need to decide if you want tumbling windows or sliding windows, and whether you are operating in event time or processing time.

But roughly speaking, you'll do something like this:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource( ... )
    .timeWindowAll(Time.minutes(15))
    .apply(new UniqueWordCounter())
    .print()
env.execute()

Your UniqueWordCounter will be a WindowFunction that receives an iterable of all the words in a window, and returns the number of unique words.

On the other hand, if you are using a keyedstream and want to count unique words for each key, modify your application accordingly:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource( ... )
    .keyBy( ... )
    .timeWindow(Time.minutes(15))
    .apply(new UniqueWordCounter())
    .print()
env.execute()