Lets say we have a setup as follows.
Schema evolution compatibility is set to BACKWARD.
JDBC Source Connector polls data from DB writing to Kafka topic.HDFS Sink Connector read message from Kafka topic and write to HDFS in Avro format.
Following the the flow as I understood.
- JDBC Source Connector query DB and generate the Schema V1 from JDBC Metadata from ResultSet.V1 has col1,col2,col3.Schema V1 is registered in Schema Registry.
- Source connector polls data from DB and write messages to the Kafka topic in V1 schema.
- (Question 1) When HDFS Sink connector read messages from the topic ,does it validate the messages against the V1 schema from the Schema Registry?
Next DB schema is changed. Column "col3" is removed from the table.
- Next time JDBC Source polls DB it sees that the schema has changed, generate new Schema V2 (col1,col2) and register V2 is Schema Registry.
- Source Connect continue polling data and write to Kafka topic in V2 schema.
- Now the Kafka Topic can have messages in both V1 and V2 schema.
- (Question 2) When HDFS Sink connector read message does it now validate messages against Schema V2 ?
This this the case addressed in the Confluent documentation under the Backward Compatibility ? : [https://docs.confluent.io/current/schema-registry/avro.html#schema-evolution-and-compatibility]
An example of a backward compatible change is a removal of a field. A consumer that was developed to process events without this field will be able to process events written with the old schema and contain the field – the consumer will just ignore that field.