1
votes

I have been attempting to run an apache beam job on Dataflow, but I'm getting an error from GCP with the following message:

The job graph is too large. Please try again with a smaller job graph, or split your job into two or more smaller jobs.

I have run jobs with larger graphs in the past and had no problems. The job also runs fine locally with DirectRunner. There are about 12 nodes in the graph including a read from Bigquery step, a WriteToText step and a CoGroupByKey step.

Is there a way to increase the graph size Dataflow is willing to accept?

1

1 Answers

1
votes

With a small pipeline, the most likely cause of this is accidentally serializing extra data into your DoFns (or other serialized code). Do you have any large objects in your main class that are being automatically included in closures? If so, the easiest thing to do is do build up your pipeline in a static function.

It's not possible to raise the graph size limit.