We are using Apache Kafka for communication between micro-services. We have an Avro schema which needs to be expanded by adding a new field. In order to be compatible, we have to add a default value to the new field. However from the application logic perspective, it makes no sense to have a default value. New consumers need to be able to consume only the schema with the new field required from the start.
We would like to deprecate the old schema but still leave a short period of time for old consumers to adjust, while having new consumers ignore messages without the required fields. I read that Kafka generally allows for two completely separate schemas to be sent to the same topic and for consumers to choose which schema to consume, but I guess this is not possible if you're using Avro? Is there a way to have those two incompatible schemas for the same topic for a time without adding any default values? Old schema would be removed after a time and there would be no producers producing the old schema after deprecation period passes.
Work around would be to have separate topics, but I would like to avoid that if possible.
EDIT: Possibly I was not clear enough with the question.
Is it possible to have two completely unconnected schemas, with no matching fields on same topic?
For example let's say user & purchase tracking. Specific service application logic would handle all the implications of such a system. Some services do not care about users, but only record purchases for financial reporting, others just track users, maybe for some kind of user management or notifications, while third group cares about both types and needs to know for sure, that user existed and was active at a time when they wanted to makes purchase. Here I would need 2 loosely related schemas (related only in the sense that purchase needs a foreign key to the user, but this is application logic relation, not schema relation). Obviously these are two separate objects and no defaults make sense in order to make schemas compatible. Each micro-service would decide what schemas to consume from the one topic, but the order of messages would still be crucial.
Further EDIT:
After further research I believe io.confluent.kafka.serializers.subject.strategy.TopicRecordNameStrategy should do the trick, as OneCricketeer suggested, I just need to figure out how to configure it
This is a relevant section of my application.yml so far:
spring:
...
cloud:
stream:
schemaRegistryClient:
endpoint: http://localhost:8081
default:
producer:
useNativeEncoding: true
consumer:
useNativeEncoding: true
kafka:
binder:
brokers: http://localhost:9092
producer-properties:
key.serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
value.serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
schema.registry.url: http://localhost:8081
consumer-properties:
key.deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
value.deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
schema.registry.url: http://localhost:8081
specific.avro.reader: true
bindings:
foo:
destination: foo
content-type: application/*+avro
...
So where do I set the subject strategy?