0
votes

I am using Kafka Connect with JSONSchema and am in a situation where I need to convert the JSON schema manually (to "Schema") within a Kafka Connect plugin. I can successfully retrieve the JSON Schema from the Schema Registry and am successful converting with simple JSON Schemas but I am having difficulties with ones that are complex and have valid "$ref" tags referencing components within a single JSON Schema definition.

I have several questions:

  1. The JsonConverter.java does not appear to handle "$ref". Am I correct, or does it handle it in another way elsewhere?
  2. Does the Schema Registry handle the referencing of sub-definitions? If yes, is there code that shows how the dereferencing is handled?
  3. Should the JSON Schema be resolved to a string without references (ie. inline the references) before submitting to the Schema Registry and thereby remove the "$ref" issue?

I am looking at the Kafka Source code module JsonConverter.java below:

https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L428

An example of the complex schema (taken from the JSON Schema site) is shown below (notice the "$ref": "#/$defs/veggie" tag the references a later sub-definition)

{
  "$id": "https://example.com/arrays.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "A representation of a person, company, organization, or place",
  "title": "complex-schema",
  "type": "object",
  "properties": {
    "fruits": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "vegetables": {
      "type": "array",
      "items": { "$ref": "#/$defs/veggie" }
    }
  },
  "$defs": {
    "veggie": {
      "type": "object",
      "required": [ "veggieName", "veggieLike" ],
      "properties": {
        "veggieName": {
          "type": "string",
          "description": "The name of the vegetable."
        },
        "veggieLike": {
          "type": "boolean",
          "description": "Do I like this vegetable?"
        }
      }
    }
  }
}

Below is the actual schema returned from the Schema Registry after it the schema was successfully registered:

[
  {
    "subject": "complex-schema",
    "version": 1,
    "id": 1,
    "schemaType": "JSON",
    "schema": "{\"$id\":\"https://example.com/arrays.schema.json\",\"$schema\":\"https://json-schema.org/draft/2020-12/schema\",\"description\":\"A representation of a person, company, organization, or place\",\"title\":\"complex-schema\",\"type\":\"object\",\"properties\":{\"fruits\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"vegetables\":{\"type\":\"array\",\"items\":{\"$ref\":\"#/$defs/veggie\"}}},\"$defs\":{\"veggie\":{\"type\":\"object\",\"required\":[\"veggieName\",\"veggieLike\"],\"properties\":{\"veggieName\":{\"type\":\"string\",\"description\":\"The name of the vegetable.\"},\"veggieLike\":{\"type\":\"boolean\",\"description\":\"Do I like this vegetable?\"}}}}}"
  }
]

The actual schema is embedded in the above returned string (the contents of the "schema" field) and contains the $ref references:

{\"$id\":\"https://example.com/arrays.schema.json\",\"$schema\":\"https://json-schema.org/draft/2020-12/schema\",\"description\":\"A representation of a person, company, organization, or place\",\"title\":\"complex-schema\",\"type\":\"object\",\"properties\":{\"fruits\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}},\"vegetables\":{\"type\":\"array\",\"items\":{\"$ref\":\"#/$defs/veggie\"}}},\"$defs\":{\"veggie\":{\"type\":\"object\",\"required\":[\"veggieName\",\"veggieLike\"],\"properties\":{\"veggieName\":{\"type\":\"string\",\"description\":\"The name of the vegetable.\"},\"veggieLike\":{\"type\":\"boolean\",\"description\":\"Do I like this vegetable?\"}}}}}
1
Why would you expect the response to NOT include the $ref, as per the schema you submitted?Relequestual
Yes, the "$ref" is expected, but the question is really about how/where to handle and resolve the "$ref" tags.Eric Broda
I wouldn't expect you to pre-process them at all. They are part of the JSON Schema standard and so should be understood by JSON Schema implementations. $ref is not like some sort of template command. The resolution should be done by the implementation which uses the schema when it's validating your data.Relequestual

1 Answers

0
votes

Again, the JsonConverter in the Apache Kafka source code has no notion of JSONSchema, therefore, no, $ref doesn't work and it also doesn't integrate with the Registry.

You seem to be looking for the io.confluent.connect.json.JsonSchemaConverter class + logic