1
votes

I'm trying to implement the default window with the default trigger to evaluate the behavior but it's not emitting any result.

According to Apache Beam:

The default trigger for a PCollection is based on event time, and emits the results of the window when the Beam’s watermark passes the end of the window, and then fires each time late data arrives.


If you are using both the default windowing configuration and the default trigger, the default trigger emits exactly once, and late data is discarded. This is because the default windowing configuration has an allowed lateness value of 0.

my code:

Nb_items = lines | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
                 | 'print' >> beam.ParDo(PrintFn())

It only emits the data if I set a trigger

Nb_items = lines | 'window' >> beam.WindowInto(window.GlobalWindows(),
            trigger=trigger.AfterProcessingTime(10),
            accumulation_mode=trigger.AccumulationMode.DISCARDING) \
        | 'CountGlobally' >> beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
        | 'print' >> beam.ParDo(PrintFn())

How can I observe the default behavior without setting a trigger?

Is the problem in the combine transform?

If your input PCollection uses the default global windowing, the default behavior is to return a PCollection containing one item. That item’s value comes from the accumulator in the combine function that you specified when applying Combine

2
Are you in stream processing ? - guillaume blaquiere
Yes, but at a certain moment I get all the data - Rim

2 Answers

1
votes

The current issue is that the watermark never reaches the end of the GlobalWindow. To have a default trigger, you can use any other window where the watermark can reach the end, e.g.: 'window' >> beam.WindowInto(window.FixedWindows(10))

As Guillaume rightly asks, if you're running on Batch, triggers are basically ignored.