A simple Spark streaming app without any heavy in memory computation is consuming 17GB of Memory as soon after the STATE
gets changed to RUNNING
.
Cluster setup:
- 1x master (2 vCPU, 13.0 GB memory)
- 2x workers (2 vCPU, 13.0 GB memory)
YARN resource manager displays: Mem Total - 18GB, vCore Total - 4
Spark streaming app source code can be found here and as you can see it doesn't do much:
Spark submit command (through SSH not GCLOUD SDK):
spark-submit --master yarn \
--deploy-mode cluster \
--num-executors 1 \
--driver-cores 1 \
--executor-memory 1g \
--driver-memory 512m \
--executor-cores 1 \
--class JavaCustomReceiver my_project.jar
Why would such a simple app allocate that much of memory?
I'm using GCP Dataproc default configuration, is there any YARN config that should be amended?