I am trying to run a Spark Kafka Job written in Java to produce around 10K records per batch to a Kafka Topic. This is a spark batch job which reads 100(total 1million records) hdfs part files sequentially in a loop and produce each part file of 10K records in a batch. I am using org.apache.kafka.clients.producer.KafkaProducer API
Getting below exception:
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
....
Caused by: org.apache.kafka.common.KafkaException: java.io.IOException: Too many open files
....
Caused by: java.io.IOException: Too many open files
Below is the configurations:
Cluster Resource availability:
---------------------------------
The cluster has more than 500 nodes, 150 Terabyte total memory, more than 30K cores
Spark Application configuration:
------------------------------------
Driver_memory: 24GB
--executor-cores: 5
--num-executors: 24
--executor-memory: 24GB
Topic Configuration:
--------------------
Partitions: 16
Replication: 3
Data size
----------
Each part file has 10K records
Total records 1million
Each batch produce 10K records
Please suggest some solutions for this as this is a very critical issue.
Thanks in advance