Recently I've been doing performance tests on Spark Streaming. But some problems puzzled me a lot.
In Spark Streaming, receivers are scheduled to run in executors on worker nodes.
- How many receivers are there in a cluster? Can I control the number of receivers?
- If not all workers run receiver to receive stream data, the other worker nodes will not receive any data? In that case, how can I guarantee task scheduling based on data locality? Copy data from nodes those run receivers?