3
votes

Is there a way in Flink (batch/streaming) to compute the average and sum of a field at the same time? Using the aggregate method I can compute the sum of a field on a groupBy result, but how do I calculate the average also at the same time? Example code below.

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple3<String,Integer,Double>> source = 
         env.readCsvFile(PathConfig.LINEITEM_1)
         .fieldDelimiter("|")
         types(String.class, Integer.class, Double.class);

source.groupBy(0,1).aggregate(Aggregations.SUM, 2);
//average of field 2???
1
Could you use map/reduce instead of aggregate ? - ImbaBalboa
I could use reduceGroup to calculate both sum and average manually, but since there is already a nice aggregate sum function, I thought maybe there is also a way to calculate the average automatically. - Eli

1 Answers

2
votes

For simple tasks like CSV parsing, grouping, and aggregating I would recommend to use Flink's Table API.

If you rather want to use more low-level APIs, you can implement a GroupReduce function that sums/counts (until the iterator has no more elements) and produces a final average at the end.