I want to run my Glue Job in parallel. Basically, I am starting my Glue Job from Step Function, which is dependent on finishing the previous state which is Lambda putting msgs on SQS. Then my Glue Job is taking msg from SQS one by one. I want to speed up such processing on GLUE Job side, by running it in parallel.
In Step Function I can see two ways to achieve parallelism:
- "Map" state
- "Parallel" state
According to AWS doc: "While the Parallel state executes multiple branches of steps using the same input, a Map state will execute the same steps for multiple entries of an array in the state input."
But, in my case "The Input" inside Step Function is useless, as I am using SQS. When going with "Parallel" state, I would need to duplicate the same "step" in state machine.. (code duplication), and when going with "Map" state, I would need to create some kind of artificial array just to force parallelism. Not sure If I understand it correctly, or if there is another way. Please suggest and help!