Why is an Avro Schema Registry needed with statically typed languages?

Question

I have been wondering about the need for an Avro Schema Registry when consuming messages from a Kafka topic using a statically typed language, like Java. I'm consuming messages from a Kafka topic setup like this:

    Properties props = new Properties();
    props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, String.join(",", kafkaProperties.getServers()));
    props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class.getName());
    props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class.getName());

    props.setProperty(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG, kafkaProperties.getSchemaRegistryUrl());
KafkaConsumer<byte[], FooClass> kafkaConsumer = new KafkaConsumer<>(props);;

In my project I have .avsc files that define the schema for FooClass class. I have also configured the avro-maven-plugin to generate the class FooClass for me at build time.

Why do I still need to specify a Schema Registry URL? Isn't my consumer able to deserialize the values of my Kafka messages using the .avsc file in my project?

eik eik · Accepted Answer · 2020-04-20T16:55:23

You're using Confluent libraries (io.confluent.kafka.serializers.KafkaAvroDeserializer) which define their own Confluent Avro format and mandate the use of the Confluent Schema Registry.

Technically, you don't need a registry for Apache Avro.

Avro needs the writers schema to decode the message, and while this is included in Avro files, making them self-describing, it's not included in the streaming format or Confluent Avro.

So, the client needs some way to look up the schema. This is either solved by the Confluent Schema Registry for the Confluent Avro format or could be solved by your own org.apache.avro.message.SchemaStore. See this example, where I use a SchemaStore.Cache pre-filled with known schemata.

Note that the example uses the Apache Avro format, which is incompatible with Confluent Avro.

The Confluent Avro deserializer needs a Confluent Schema Registry and has no API for „run with known schemata“.

Why is an Avro Schema Registry needed with statically typed languages?

3 Answers