2
votes

When I try to run multiple hadoop jobs in EMR cluster, they all run one after the other (I can see the progress using yarn application -list).

  1. Is there a way to run all these hadoop jobs in parallel?
  2. Will passing multiple hadoop jobs in a single step solve this issue? If yes, How to pass multiple jobs within a single step?
1

1 Answers

1
votes

If you use the HadoopActivity with either the FAIR scheduler or capacity scheduler, you can run multiple steps in parallel.

https://aws.amazon.com/about-aws/whats-new/2015/06/run-parallel-hadoop-jobs-on-your-amazon-emr-cluster-using-aws-data-pipeline/