0
votes

I am trying to create aggregators to count values that satisfy a condition across all input data . I looked into documentation and found the below for creation .

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Aggregator ..

I am using : google-cloud-dataflow-java-sdk-all - 2.4.0 (apache beam based)

However I am not able to find the corresponding class in the new beam api.. I looked into org.apache.beam.sdk.transforms package .

Can you please let me know how can I use aggregators with dataflow runner in new api . ?

2

2 Answers

1
votes

The link you have is for the old SDK (1.x).

In SDK 2.x, you should refer to apache-beam SDK. For the Aggregators you mentioned, if I understand correctly, it's for adding counters during processing. I guess the corresponding package should be org.apache.beam.sdk.metrics.

Package org.apache.beam.sdk.metrics Metrics allow exporting information about the execution of a pipeline.

and org.apache.beam.sdk.metrics.Counter interface:

A metric that reports a single long value and can be incremented or decremented.

0
votes

As of now, there seem to be no replacement for the Aggregator class in Apache Beam SDK 2.X. An alternate solution to count values respecting a condition would be Transforms. By using the GroupBy transform to collect data meeting a condition and then the Combine transform, you can get a count of the input data respecting the condition.