How do I COPY a nested Avro field to Redshift as a single field?

Question

I have the following Avro schema for a record, and I'd like to issue a COPY to Redshift:

"fields": [{
    "name": "id",
    "type": "long"
}, {
    "name": "date",
    "type": {
        "type": "record",
        "name": "MyDateTime",
        "namespace": "com.mynamespace",
        "fields": [{
            "name": "year",
            "type": "int"
        }, {
            "name": "monthOfYear",
            "type": "int"
        }, {
            "name": "dayOfMonth",
            "type": "int"
        }, {
            "name": "hourOfDay",
            "type": "int"
        }, {
            "name": "minuteOfHour",
            "type": "int"
        }, {
            "name": "secondOfMinute",
            "type": "int"
        }, {
            "name": "millisOfSecond",
            "type": ["int", "null"],
            "default": 0
        }, {
            "name": "zone",
            "type": {
                "type": "string",
                "avro.java.string": "String"
            },
            "default": "America/New_York"
        }],
        "noregistry": []
    }
}]

I want to condense the object in MyDateTime during the COPY to a single column in Redshift. I saw that you can map nested JSON data to a top-level column: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json-jsonpaths , but I haven't figured out a way to concatenate the fields directly in the COPY command.

In other words, is there a way to convert the following record (originally in Avro format)

{
    "id": 6,
    "date": {
        "year": 2010,
        "monthOfYear": 10,
        "dayOfMonth": 12,
        "hourOfDay": 14,
        "minuteOfHour": 26,
        "secondOfMinute": 42,
        "millisOfSecond": {
            "int": 0
        },
        "zone": "America/New_York"
    }
}

Into a row in Redshift that looks like:

id | date
---------------------------------------------
6  | 2010-10-12 14:26:42:000 America/New_York

I'd like to do this directly with COPY

Joe Harris Joe Harris · Accepted Answer · 2018-10-03T13:00:42

You would need to declare the Avro file(s) as an Redshift Spectrum external table and then use a query over that to insert the data into the local Redshift table.

https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html

How do I COPY a nested Avro field to Redshift as a single field?

1 Answers