1
votes

A simple Spark streaming app without any heavy in memory computation is consuming 17GB of Memory as soon after the STATE gets changed to RUNNING.

Cluster setup:

  • 1x master (2 vCPU, 13.0 GB memory)
  • 2x workers (2 vCPU, 13.0 GB memory)

YARN resource manager displays: Mem Total - 18GB, vCore Total - 4

Spark streaming app source code can be found here and as you can see it doesn't do much:

Spark submit command (through SSH not GCLOUD SDK):

spark-submit --master yarn \
             --deploy-mode cluster \
             --num-executors 1 \
             --driver-cores 1 \
             --executor-memory 1g  \
             --driver-memory 512m \
             --executor-cores 1 \
             --class JavaCustomReceiver my_project.jar

Why would such a simple app allocate that much of memory?

I'm using GCP Dataproc default configuration, is there any YARN config that should be amended?

1

1 Answers

1
votes

How many tasks does your application require? Note, Dataproc by default has dynamic allocation turned on which will request more executors from YARN as necessary.