5
votes

I might need to work with Kafka and I am absolutely new to it. I understand that there are Kafka producers which will publish the logs(called events or messages or records in Kafka) to the Kafka topics.

I will need to work on reading from Kafka topics via consumer. Do I need to set up consumer API first then I can stream using SparkStreaming Context(PySpark) or I can directly use KafkaUtils module to read from kafka topics?

In case I need to setup the Kafka consumer application, how do I do that? Please can you share links to right docs.

Thanks in Advance!!

2

2 Answers

5
votes

Spark provide internal kafka stream in which u dont need to create custom consumer there is 2 approach to connect with kafka 1 with receiver 2. direct approach. For more detail go through this link http://spark.apache.org/docs/latest/streaming-kafka-integration.html

1
votes

There's no need to set up kafka consumer application,Spark itself creates a consumer with 2 approaches. One is Reciever Based Approach which uses KafkaUtils class and other is Direct Approach which uses CreateDirectStream Method. Somehow, in any case of failure ion Spark streaming,there's no loss of data, it starts from the offset of data where you left.

For more details,use this link: http://spark.apache.org/docs/latest/streaming-kafka-integration.html