Is there a difference between ParDo
and FlatMap
in Dataflow / Apache Beam?
I think both apply a function to each element of the incoming PCollection
, and return the iterable; but I imagine there must be some difference?
FlatMap
is a simpler operation built as you might expect from ParDo
. If this fits your needs, it is a good choice.
ParDo
is a lower-level building block of element-wise computation that has additional capabilities like side inputs, multiple output collections, access to the current window, some really low level callbacks for starting and committing bundle of elements, and more.
In practice, many uses of FlatMap
and ParDo
end up with a similar code bulk, but in my opinion it is most readable to use the simplest (highest level) transform available.