1
votes

I am using a spark streaming application(spark 2.1) to consume data from kafka(0.10.1) topics.I want to subscribe to new topic without restarting the streaming context. Is there any way to achieve this?

I can see a jira ticket in apache spark project for the same (https://issues.apache.org/jira/browse/SPARK-10320),Even though it is closed in 2.0 version, I couldn't find any documentation or example to do this. If any of you are familiar with this, please provide me documentation link or example for the same, . Thanks in advance.

3

3 Answers

2
votes

Integration between Spark 2.0.x and Kafka 0.10.x support a subscription pattern. From the documentation:

SubscribePattern allows you to use a regex to specify topics of interest. Note that unlike the 0.8 integration, using Subscribe or SubscribePattern should respond to adding partitions during a running stream.

You can use a regex pattern to register to all the topics you wish.

class SubscribePattern[K, V](
    pattern: java.util.regex.Pattern,
    kafkaParams: java.util.Map[String, Object],
    offsets: java.util.Map[TopicPartition, java.util.Long]
) extends ConsumerStrategy[K, V]
0
votes

You can subscribe multiple topics like topic1,topic2 etc

val df = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1,topic2")
  .load()

For more information, kafka Guide

-1
votes

I found this solution more suitable for my purpose.We can share a 'StreamingContext' instance with different dstreams. For better management we can create separate 'dStream' instance for each topic using same streaming context, this 'dStream' instance you can store in a map with its topic name, so that later you can stop or unsubscribe from that particular topic. Please see the code below for clarity.

<script src="https://gist.github.com/shemeemsp7/01d21588347b94204c71a14005be8afa.js"></script>