6
votes

Is there a difference between ParDo and FlatMap in Dataflow / Apache Beam?

I think both apply a function to each element of the incoming PCollection, and return the iterable; but I imagine there must be some difference?

1

1 Answers

13
votes

FlatMap is a simpler operation built as you might expect from ParDo. If this fits your needs, it is a good choice.

ParDo is a lower-level building block of element-wise computation that has additional capabilities like side inputs, multiple output collections, access to the current window, some really low level callbacks for starting and committing bundle of elements, and more.

In practice, many uses of FlatMap and ParDo end up with a similar code bulk, but in my opinion it is most readable to use the simplest (highest level) transform available.