I created a beam pipeline that I am running on dataflow. The pipeline contains 4 steps:
- read file contents
- convert file contents to json
- transform the json entries
- save transformed json entries into GCS
The probleme is that steps 3 and 4 are blocked waiting for steps 1 and 2 to finish reading all files.. Is there an explanation why the latest steps don't just handle each file data on the flow ?