I use Apache Spark 2.1 and Apache Kafka 0.9.
I have a Spark Streaming application that runs with 20 executors and reads from Kafka that has 20 partitions. This Spark application does map
and flatMap
operations only.
Here is what the Spark application does:
- Create a direct stream from kafka with interval of 15 seconds
- Perform data validations
- Execute transformations using drool which are map only. No reduce transformations
- Write to HBase using check-and-put
I wonder if executors and partitions are 1-1 mapped, will every executor independently perform above steps and write to HBase independently, or data will be shuffled within multiple executors and operations will happen between driver and executors?