0
votes

I am trying to understand Avro schemas and stuck with complex types (record). The problem is very simple: create a schema which contains one record filed with two primitive fields (string and timestamp) nested to record. I see two options for the schema:

option 1

{
    "type": "record",
    "name": "cool_subject",
    "namespace": "com.example",  
    "fields": [
        {
            "name": "field_1",
            "type": "record"
            "fields": [
                {"name": "operation", "type": "string"},
                {"name": "timestamp", "type": "long", "logical_type": "timestamp_millis"}
            ]
        }
    ]
}

option 2

{
    "type": "record",
    "name": "cool_subject",
    "namespace": "com.example",  
    "fields": [
        {
            "name": "field_1",
            "type": {
                "type": "record",
                "name": "field_1_type",
                "fields": [
                    {"name": "operation", "type": "string"},
                    {"name": "timestamp", "type": {"type": "long", "logical_type": "timestamp_millis"}}
                ]
            }
        }
    ]
}

The difference is in the "type" attribute.

As far as I know opt2 is the correct way. Am I right? Is opt1 valid?

1

1 Answers

0
votes

The second one is correct. The first one is not valid.

A record schema is something that looks like this:

{
    "type": "record",
    "name": <Name of the record>,
    "fields": [...],
}

And for fields, it should be like this:

[
    {
        "name": <name of field>,
        "type": <type of field>,
    },
    ...
]

So in the case of a field which contains a record, it should always look like this:

[
    {
        "name": <name of field>,
        "type": {
            "type": "record",
            "name": <Name of the record>,
            "fields": [...],
        }
    },
    ...
]

The format in the first example would make it unclear if the name "field_1" was the name of the field or the name of the record.