4
votes

I wish to read only latest msg in spark streaming using kafka, but it also fetches past data

How to set auto.offset.reset in KafkaUtil for spark

JavaPairReceiverInputDStream<String, String> messages =
            KafkaUtils.createStream(jssc, args[0], args[1], topicMap);

how to set the conf to fetching only current message . Please give some example.

Thanks in advance, there is also another thread

But not sufficient, pls help me out. Thanks in advance.

1

1 Answers

7
votes

You need to use this method from KafkaUtils object:

 def createStream[K, V, U <: Decoder[_], T <: Decoder[_]](
      jssc: JavaStreamingContext,
      keyTypeClass: Class[K],
      valueTypeClass: Class[V],
      keyDecoderClass: Class[U],
      valueDecoderClass: Class[T],
      kafkaParams: JMap[String, String],
      topics: JMap[String, JInt],
      storageLevel: StorageLevel
    )

Depending on the Spark version, you cannot use java. There is a bug.

If you are using Spark 1.1.0, you need to add into kafkaParams parameter this property:

"auto.offset.reset", "largest"

Another workaround is generate a groupId prefix randomly, but this is crappy.