5
votes

In apache beam python sdk , I often see '>>' operator in pipeline procedure.

https://beam.apache.org/documentation/programming-guide/#pipeline-io

lines = p | 'ReadFromText' >> beam.io.ReadFromText('path/to/input-*.csv')

What does this mean?

1

1 Answers

5
votes

>> is the right bitwise shift operator in Python. The equivalent dunder (double underscore) method is __rrshift__().

The implementation of Apache Beam in Python simply redefines __rrshift__() for the PTransform class so that names can be added to the transform. It's just special syntax. In your example, "ReadFromText" is the name of the transform.

Reference: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L445