When using tensorflow java for inference the amount of memory to make the job run on YARN is abnormally large. The job run perfectly with spark on my computer (2 cores 16Gb of RAM) and take 35 minutes to complete. But when I try to run it on YARN with 10 executors 16Gb memory and 16 Gb memoryOverhead the executors are killed for using too much memory.
Prediction Run on an Hortonworks cluster with YARN 2.7.3 and Spark 2.2.1. Previously we used DL4J to do inference and everything run under 3 min. Tensor are correctly closed after usage and we use a mapPartition to do prediction. Each task contain approximately 20.000 records (1Mb) so this will make input tensor of 2.000.000x14 and output tensor of 2.000.000 (5Mb).
option passed to spark when running on YARN
--master yarn --deploy-mode cluster --driver-memory 16G --num-executors 10 --executor-memory 16G --executor-cores 2 --conf spark.driver.memoryOverhead=16G --conf spark.yarn.executor.memoryOverhead=16G --conf spark.sql.shuffle.partitions=200 --conf spark.tasks.cpu=2
This configuration may work if we set spark.sql.shuffle.partitions=2000 but it take 3 hours
UPDATE:
The difference between local and cluster was in fact due to a missing filter. we actually run the prediction on more data than we though.