I've been trying to debug a pipeline which takes a input parameter that drives subsequent ParDo operations. For reasons that I cannot fathom, the pipeline will not scale beyond a single worker even though I've disabled autoscaling and set the number of workers. Sadly the abysmal dataflow interface on GCP does little to illuminate the inability to scale. Can anyone advise as to what might be the issue or how to debug effectively?
with beam.Pipeline(options=opts) as p:
result = (
p | "Initialize Pipeline" >> beam.Create(
[(f'gs://data/']) |
"Scan for extraction tasks" >> beam.ParDo(scanner.ScanForTasks()) |
"Extract data" >> beam.ParDo(worker.TaskToData()))