
I am trying to create aggregators to count values that satisfy a condition across all input data . I looked into documentation and found the below for creation .

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Aggregator ..

I am using : google-cloud-dataflow-java-sdk-all - 2.4.0 (apache beam based)

However I am not able to find the corresponding class in the new beam api.. I looked into org.apache.beam.sdk.transforms package .

Can you please let me know how can I use aggregators with dataflow runner in new api . ?


2 Answers


The link you have is for the old SDK (1.x).

In SDK 2.x, you should refer to apache-beam SDK. For the Aggregators you mentioned, if I understand correctly, it's for adding counters during processing. I guess the corresponding package should be org.apache.beam.sdk.metrics.

Package org.apache.beam.sdk.metrics Metrics allow exporting information about the execution of a pipeline.

and org.apache.beam.sdk.metrics.Counter interface:

A metric that reports a single long value and can be incremented or decremented.


As of now, there seem to be no replacement for the Aggregator class in Apache Beam SDK 2.X. An alternate solution to count values respecting a condition would be Transforms. By using the GroupBy transform to collect data meeting a condition and then the Combine transform, you can get a count of the input data respecting the condition.