2
votes

To be clear, I am not trying to use Kafka as the data store for event sourcing, merely to replicate events.

The Confluent Schema Registry for Kafka seems very interesting in that it can validate the schema for messages sent by producers to a topic. However, from what I understand it treats each topic like a container file - one schema per topic.

This restriction doesn't work for an event source stream where for a single aggregate like File you will have multiple message schemas: FileCreated, FileMoved, FileCopied, FileDeleted. Putting each of these on a separate topic would be complicated and error prone.

Does there exist a tool like Schema Registry which supports multiple schemas for the same topic?

Update

To clarify, each of the messages above would have a different schema. For example:

FileCreated:

{
  type: "record",
  name: "FileCreated",
  fields: [
    { name: "id", type: "string" },
    { name: "name", type: "string" },
    { name: "path", type: "string" },
    { name: "size", type: "string" },
    { name: "mimeType", type": "string" },
    { name: "user", type: "string" },
    { name: "date", type: "long" }
  ]
}

FileMoved:

{
  type: "record",
  name: "FileMoved",
  fields: [
    { name: "id", type: "string" },
    { name: "from", type: "string" },
    { name: "to", type: "string" },
    { name: "date", type: "long" },
    { naem: "user", type: "string" }
  ]
}

FileDeleted:

{ 
  type: "record",
  name: "FileDeleted",
  fields: [
    { name: "id", type: "string" },
    { name: "date", type: "long" },
    { name: "user", type: "string" }
  ]
}
1
same question, is there a way to specify "oneOf" in json for avro schema? - aasthetic

1 Answers

3
votes

Confluent Schema Registry does in fact support multiple schemas for the same topic.

That said, the best practice is to not use the same topic for different types of data -- say, you typically shouldn't write page view events and user profile updates into the same topic.

A common example to use multiple schemas for the same topic is to allow for schema evolution, e.g. to start with a basic schema for user profiles (e.g. just username and age), which subsequently will be enhanced to a more full-fledged schema for user profiles (username, age, geo-region, preferred language, date of last visit, ...).

Whether or not you want to store FileCreated, FileMoved, FileCopied, FileDeleted into the same topic is up to you. In either case Confluent Schema Registry does allow you to manage the corresponding schemas (see docs).

More specific docs pointers:

  • Register a new schema. To register new/multiple schemas under the same subject you simply need to, well, register them via the corresponding API call. Note that registering a new schema (i.e. when an initial schema was already registered with a subject) may fail depending on Avro compatibility settings, see next point.
  • Defining Avro compatibility settings for schemas (globally, or for schemas that are registered for the same subject/topic). See e.g. GET /config/(string: subject), which returns the (Avro schema) compatibility level for a subject.

Quoting:

A schema should be compatible with the previously registered schemas (if there are any) as per the configured compatibility level. The configured compatibility level can be obtained by issuing a GET /config/(string: subject). If that returns null, then GET /config.

Also, valid (Avro schema) compatibility settings are: NONE, FULL, FORWARD, BACKWARD. So if you really wanted to store, say, completely different data types in the same Kafka topic, you should (a) set the Avro schema compatibility for the corresponding subject/topic to NONE and (b) register the relevant Avro schema(s) for each data type under that subject/topic.