0
votes

I am working on Kafka and as a beginner the following question popped out of my mind.

Every time we design the schema for Avro, we create the Java object out of it through its jars.

Now we use that object to populate data and send it from Producer.

For consuming the message we generate the Object again in Consumer. Now the objects generated in both places Producer & Consumer contains a field
"public static final org.apache.avro.Schema SCHEMA$" which actually stores the schema as a String.

If that is the case then why should kafka use schema registry at all ? The schema is already available as part of the Avro objects.

Hope my question is clear. If someone can answer me, It would be of great help.

2

2 Answers

1
votes

Schema Registry is the repository which store the schema of all the records sent to Kafka. So when a Kafka producer send some records using KafkaAvroSerializer. The schema of the record is extracted and stored in schema registry and the actual record in Kafka only contains the schema-id.

The consumer when de-serializing the record fetches the schema-id and use it to fetch the actual schema from schema- registry. The record is then de-serialized using the fetched schema.

So in short Kafka does not keep a copy of schema in every record instead it is stored in schema registry and referenced via schema-id.

This helps in saving space while storing records also to detect any schema compatibility issue between various clients.

https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html

0
votes

Schema registry is a central repo for all the schemas and helps in enforcing schema compatibility rules while registering new schemas , without which schema evolution would be difficult. Based on the configured compatibility ( backward, forward , full etc) , the schema registry will restrict adding new schema which doesn't confirm to the configured compatibility.