2
votes

I have a Spring Cloud Stream Kafka Stream application that reads from a topic (event) and performs a simple processing:

@Configuration
class EventKStreamConfiguration {

    private val logger = LoggerFactory.getLogger(javaClass)

    @StreamListener
    fun process(@Input("event") eventStream: KStream<String, EventReceived>) {

        eventStream.foreach { key, value ->
            logger.info("--------> Processing Event {}", value)
            // Save in DB
        }
    }
}

This application is using a Kafka environment from Confluent Cloud, with an event topic with 6 partitions. The full configuration is:

spring:
  application:
    name: events-processor
  cloud:
    stream:
      schema-registry-client:
        endpoint: ${schema-registry-url:http://localhost:8081}
      kafka:
        streams:
          binder:
            brokers: ${kafka-brokers:localhost}
            configuration:
              application:
                id: ${spring.application.name}
              default:
                key:
                  serde: org.apache.kafka.common.serialization.Serdes$StringSerde
              schema:
                registry:
                  url: ${spring.cloud.stream.schema-registry-client.endpoint}
              value:
                subject:
                  name:
                    strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
              processing:
                guarantee: exactly_once
          bindings:
            event:
              consumer:
                valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
      bindings:
        event:
          destination: event

  data:
    mongodb:
      uri: ${mongodb-uri:mongodb://localhost/test}

server:
  port: 8085

logging:
  level:
    org.springframework.kafka.config: debug

---

spring:
  profiles: confluent-cloud
  cloud:
    stream:
      kafka:
        streams:
          binder:
            autoCreateTopics: false
            configuration:
              retry:
                backoff:
                  ms: 500
              security:
                protocol: SASL_SSL
              sasl:
                mechanism: PLAIN
                jaas:
                  config: xxx
              basic:
                auth:
                  credentials:
                    source: USER_INFO
              schema:
                registry:
                  basic:
                    auth:
                      user:
                        info: yyy

Messages are being correctly processed by the KStream. If I restart the application they are not reprocessed. Note: I don’t want them to be reprocessed, so this behaviour is ok.

However the startup logs show some strange bits:

  1. First it displays the creation of a restore consumer client. with auto offset reset none:
2019-07-19 10:20:17.120  INFO 82473 --- [           main] o.a.k.s.p.internals.StreamThread         : stream-thread [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1] Creating restore consumer client
2019-07-19 10:20:17.123  INFO 82473 --- [           main] o.a.k.clients.consumer.ConsumerConfig    : ConsumerConfig values: 
    auto.commit.interval.ms = 5000
    auto.offset.reset = none
  1. Then it creates a consumer client with auto offset reset earliest.
2019-07-19 10:20:17.235  INFO 82473 --- [           main] o.a.k.s.p.internals.StreamThread         : stream-thread [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1] Creating consumer client
2019-07-19 10:20:17.241  INFO 82473 --- [           main] o.a.k.clients.consumer.ConsumerConfig    : ConsumerConfig values: 
    auto.commit.interval.ms = 5000
    auto.offset.reset = earliest
  1. The final traces of the startup log show an offset reset to 0. This happens on every restart of the application:
2019-07-19 10:20:31.577  INFO 82473 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2019-07-19 10:20:31.578  INFO 82473 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams    : stream-client [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f] State transition from REBALANCING to RUNNING
2019-07-19 10:20:31.669  INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher       : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-3 to offset 0.
2019-07-19 10:20:31.669  INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher       : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-0 to offset 0.
2019-07-19 10:20:31.669  INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher       : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-1 to offset 0.
2019-07-19 10:20:31.669  INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher       : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-5 to offset 0.
2019-07-19 10:20:31.670  INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher       : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-4 to offset 0.
  1. What's the reason why there are two consumers configured?

  2. Why does the second one have auto.offset.reset = earliest when I haven't configured it explicitly and the Kafka default is latest?

  3. I want the default (auto.offset.reset = latest) behaviour and it seems to be working fine. However, doesn't it contradict what I see in the logs?

UPDATE:

I would rephrase the third question like this: Why do the logs show that the partitions are being reseted to 0 on every restart and in spite of it no messages are redelivered to the KStream?

UPDATE 2:

I've simplified the scenario, this time with a native Kafka Streams application. The behaviour is exactly the same as observed with Spring Cloud Stream. However, inspecting the consumer-group and the partitions I've found it kind of makes sense.

KStream:

fun main() {

    val props = Properties()
    props[StreamsConfig.APPLICATION_ID_CONFIG] = "streams-wordcount"
    props[StreamsConfig.BOOTSTRAP_SERVERS_CONFIG] = "localhost:9092"
    props[StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG] = 0
    props[StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG] = Serdes.String().javaClass.name
    props[StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG] = Serdes.String().javaClass.name

    val builder = StreamsBuilder()

    val source = builder.stream<String, String>("streams-plaintext-input")

    source.foreach { key, value -> println("$key $value") }

    val streams = KafkaStreams(builder.build(), props)
    val latch = CountDownLatch(1)

    // attach shutdown handler to catch control-c
    Runtime.getRuntime().addShutdownHook(object : Thread("streams-wordcount-shutdown-hook") {
        override fun run() {
            streams.close()
            latch.countDown()
        }
    })

    try {
        streams.start()
        latch.await()
    } catch (e: Throwable) {
        exitProcess(1)
    }

    exitProcess(0)
}

This is what I've seen:

1) With an empty topic, the startup shows a resetting of all partitions to offset 0:

07:55:03.885 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-2 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-3 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-0 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-1 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-4 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-5 to offset 0

2) I put one message in the topic and inspect the consumer group, seeing that the record is in partition 4:

TOPIC                   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                                                                         HOST            CLIENT-ID
streams-plaintext-input 0          -               0               -               streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1      streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 5          -               0               -               streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1      streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 1          -               0               -               streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1      streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 2          -               0               -               streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1      streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 3          -               0               -               streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1      streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 4          1               1               0               streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1      streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer

3) I restart the application. Now the resetting only affects the empty partitions (0, 1, 2, 3, 5):

07:57:39.477 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-2 to offset 0.
07:57:39.478 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-3 to offset 0.
07:57:39.478 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-0 to offset 0.
07:57:39.479 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-1 to offset 0.
07:57:39.479 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-5 to offset 0.

4) I insert another message, inspect the consumer group state and the same thing happens: the record is in partition 2 and when restarting the application it only resets the empty partitions (0, 1, 3, 5):

TOPIC                   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                                                                         HOST            CLIENT-ID
streams-plaintext-input 0          -               0               -               streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1      streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 5          -               0               -               streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1      streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 1          -               0               -               streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1      streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 2          1               1               0               streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1      streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 3          -               0               -               streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1      streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 4          1               1               0               streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1      streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
08:00:42.313 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-3 to offset 0.
08:00:42.314 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-0 to offset 0.
08:00:42.314 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-1 to offset 0.
08:00:42.314 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-5 to offset 0.
1
Spring Cloud by default wipes out local storage on restart and this trigger resetting the restore consumer offset to zero and to recreate the state -- there is a similar question on SO that should also contain the answer how to disable the cleanup.Matthias J. Sax
I've added an UPDATE 2 in the question. It also happens with native Kafka Streams applications. Apparently the KStream resets to 0 partitions that haven't received any message yet...codependent
That makes sense, because reset triggers if no offset is committed for a partition -- and Streams would not commit any offset if it did not process any data for the corresponding partition.Matthias J. Sax

1 Answers

1
votes
  1. What's the reason why there are two consumers configured?

Restore Consumer Client is a dedicated consumer for fault tolerance and state management. It is the responsible for restoring the state from the changelog topics. It is displayed seperately from the application consumer client. You can find more information here : https://docs.confluent.io/current/streams/monitoring.html#kafka-restore-consumer-client-id

  1. Why does the second one have auto.offset.reset = earliest when I haven't configured it explicitly and the Kafka default is latest?

You are right, auto.offset.reset default value is latest in Kafka Consumer. But in Spring Cloud Stream, default value for consumer startOffset is earliest. Hence it shows earliest in second consumer. Also it depends on spring.cloud.stream.bindings.<channelName>.group binding. If it is set explicitly, then startOffset is set to earliest, otherwise it is set to latest for anonymous consumer.

Reference : Spring Cloud Stream Kafka Consumer Properties

  1. I want the default (auto.offset.reset = latest) behaviour and it seems to be working fine. However, doesn't it contradict what I see in the logs?

In case of anonymous consumer group, the default value for startOffset will be latest.