4
votes

I have been trying to connect with kafka-avro-console-consumer from Confluent to our legacy Kafka cluster, which was deployed without Confluent Schema Registry. I provided schema explicitly using properties like:

kafka-console-consumer --bootstrap-server kafka02.internal:9092 \
    --topic test \
    --from-beginning \
    --property key.schema='{"type":"long"}' \
    --property value.schema='{"type":"long"}'

but I am getting 'Unknown magic byte!' error with org.apache.kafka.common.errors.SerializationException

Is it possible to consume Avro messages from Kafka using Confluent kafka-avro-console-consumer that were not serialized with AvroSerializer from Confluent and with Schema Registry?

2

2 Answers

10
votes

The Confluent Schema Registry serialiser/deserializer uses a wire format which includes information about the schema ID etc in the initial bytes of the message.

If your message has not been serialized using the Schema Registry serializer, then you won't be able to deserialize it with it, and will get the Unknown magic byte! error.

So you'll need to write a consumer that pulls the messages, does the deserialization using your Avro avsc schemas, and then assuming you want to preserve the data, re-serialize it using the Schema Registry serializer

Edit: I wrote an article recently that explains this whole thing in more depth: https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained

2
votes

kafka-console-consumer has no knowledge about key.schema or value.schema, only the Avro producer does. Source code here

The regular console consumer doesn't care about the format of the data - it'll just print UTF8 encoded bytes

The property that kafka-avro-console-consumer accepts is only schema.registry.url. So, to answer the question, yes, it needs to be serialized using the Confluent serializers.