1
votes

I am trying out Spark SQL structure streaming with Kafka. I am looking for this mandatory option subscribePattern[Java regex string] for the kafka option. apparently only 3 values are possible: "assign, "subscribe" or "subscribePattern"

When i googled about this option , the most useful piece of information came up is the following: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-streaming/spark-streaming-kafka-ConsumerStrategy.html

Can anyone put in layman term for me about the most distinct difference for among the 3 options ? and what is the different behaviour that will reflect for Spark SQL

1

1 Answers

1
votes

I am not familiar with Spark, however, for Kafka consumer, there are three options:

  1. assign: assign topic-partitions manually (ie, you can do any partition assignment you want). This disables consumer group management, thus, if you have multiple consumers and want to balance the load to need to take care by yourself to not assign partitions twice.
  2. subscribe: specify a set of topics you want to read from. Consumer group management will do the actual assignment of partitions (ie, if you have multiple consumers in a group, partitions will be distributed over all consumers within the group)
  3. pattern: similar to (2), however you specify a regex and subscribe to all topics that match the regex