I have 16 receivers in Spark Streaming 2.2.1 job. After a while, some of the receivers are processing less and less records, eventually processing only one record/second. The behaviour can be observed on the screenshot:
While I understand the root-cause can be difficult to find and not obvious, is there a way I could debug this problem further? Currently I have no idea where to start digging. Could it be related to back-pressure?
Spark streaming properties:
spark.app.id application_1599135282140_1222
spark.cores.max 64
spark.driver.cores 4
spark.driver.extraJavaOptions -XX:+PrintFlagsFinal -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dump/ -Dlog4j.configuration=file:///tmp/4f892127ad794245aef295c97ccbc5c9/driver_log4j.properties
spark.driver.maxResultSize 3840m
spark.driver.memory 4g
spark.driver.port 36201
spark.dynamicAllocation.enabled false
spark.dynamicAllocation.maxExecutors 10000
spark.dynamicAllocation.minExecutors 1
spark.eventLog.enabled false
spark.executor.cores 4
spark.executor.extraJavaOptions -XX:+PrintFlagsFinal -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dump/
spark.executor.id driver
spark.executor.instances 16
spark.executor.memory 4g
spark.jars file:/tmp/4f892127ad794245aef295c97ccbc5c9/main-e41d1cc.jar
spark.master yarn
spark.rpc.message.maxSize 512
spark.scheduler.maxRegisteredResourcesWaitingTime 300s
spark.scheduler.minRegisteredResourcesRatio 1.0
spark.scheduler.mode FAIR
spark.shuffle.service.enabled true
spark.sql.cbo.enabled true
spark.streaming.backpressure.enabled true
spark.streaming.backpressure.initialRate 25
spark.streaming.backpressure.pid.minRate 1
spark.streaming.concurrentJobs 1
spark.streaming.receiver.maxRate 100
spark.submit.deployMode client