I am using apache-beam 2.5.0 python SDK
Attaching the code snippet, in a pipeline, I am taking i/p from pubsub topic parsing it and want to perform some operation on it, when I ran it with DataflowRunner it runs fine but it seems that "data processing fun1", "data processing fun2" "data processing fun3" are running in sequential, I need it to run in parallel. I am new to dataflow.
Is there a way to parallelize it?
def run():
parser = argparse.ArgumentParser()
args, pipeline_args = parser.parse_known_args()
options = PipelineOptions(pipeline_args)
with beam.Pipeline(options=options) as p:
data = (p | "Read Pubsub Messages" >>
beam.io.ReadFromPubSub(topic=config.pub_sub_topic)
| "Parse messages " >> beam.Map(parse_pub_sub_message_with_bq_data)
)
data | "data processing fun1 " >> beam.ParDo(Fun1())
data | "data processing fun2" >> beam.ParDo(Fun2())
data | "data processing fun3" >> beam.ParDo(Fun3())
if __name__ == '__main__':
run()