0
votes

I installed the following spark benchmark: https://github.com/BBVA/spark-benchmarks I run Spark on top of YARN on 8 workers but I only get 2 running executors during the job (TestDFSIO). I also set executor-cores to be 9 but only 2 are running. Why would that happen?

I think the problem is coming from YARN because I get a similar (almost) issue with TestDFSIO on Hadoop. In fact, at the beginning of the job, only two nodes run, but then all the nodes execute the application in parallel!

Note that I am not using HDFS for storage!

1
How can you run TestDFSIO without HDFS? - tk421
I use another file system - user9332151

1 Answers

0
votes

I solved this issue. What I've done is that I set the number of cores per executor to 5 (--executor-cores) and the total number of executors to 23 (--num-executors) which was at first 2 by default.