I am using spark 1.5.2
. I need to run spark streaming job with kafka as the streaming source. I need to read from multiple topics within kafka and process each topic differently.
- Is it a good idea to do this in the same job? If so, should I create a single stream with multiple partitions or different streams for each topic?
- I am using Kafka direct steam. As far as I know, spark launches long-running receivers for each partition. I have a relatively small cluster, 6 nodes with 4 cores each. If I have many topics and partitions in each topic, would the efficiency be impacted as most executors are busy with long-running receivers? Please correct me if my understanding is wrong here