1
votes

I use Apache Spark 2.1 and Apache Kafka 0.9.

I have a Spark Streaming application that runs with 20 executors and reads from Kafka that has 20 partitions. This Spark application does map and flatMap operations only.

Here is what the Spark application does:

  1. Create a direct stream from kafka with interval of 15 seconds
  2. Perform data validations
  3. Execute transformations using drool which are map only. No reduce transformations
  4. Write to HBase using check-and-put

I wonder if executors and partitions are 1-1 mapped, will every executor independently perform above steps and write to HBase independently, or data will be shuffled within multiple executors and operations will happen between driver and executors?

1

1 Answers

1
votes

Spark jobs submit tasks that can only be executed on executors. In other words, executors are the only place where tasks can be executed. The driver is to coordinate the tasks and schedule them accordingly.

With that said, I'd say the following is true:

will every executor independently perform above steps and write to HBase independently


By the way, the answer is irrelevant to what Spark version is in use. It's always been like this (and don't see any reason why it would or even should change).