0
votes

I was trying to load an Avro file with nested record. One of the record was having a union of schema. When loaded to BigQuery, it created a very long name like com_mycompany_data_nestedClassname_value on each union element. That name is long. Wondering if there is a way to specify name without having the full package name prefixed.

For example. The following Avro schema

{
    "type": "record",
    "name": "EventRecording",
    "namespace": "com.something.event",
    "fields": [
        {
            "name": "eventName",
            "type": "string"
        },
        {
            "name": "eventTime",
            "type": "long"
        },
        {
            "name": "userId",
            "type": "string"
        },
        {
            "name": "eventDetail",
            "type": [
                {
                    "type": "record",
                    "name": "Network",
                    "namespace": "com.something.event",
                    "fields": [
                        {
                            "name": "hostName",
                            "type": "string"
                        },
                        {
                            "name": "ipAddress",
                            "type": "string"
                        }
                    ]
                },
                {
                    "type": "record",
                    "name": "DiskIO",
                    "namespace": "com.something.event",
                    "fields": [
                        {
                            "name": "path",
                            "type":  "string"
                        },
                        {
                            "name": "bytesRead",
                            "type": "long"
                        }
                    ]
                }
            ]
        }
    ]
}

Came up with enter image description here

Is that possible to make the long field name like eventDetail.com_something_event_Network_value to be something like eventDetail.Network

1

1 Answers

1
votes

Avro loading is not as flexible as it should be in BigQuery (basic example is that it does not support load a subset of the fields (reader schema). Also, renaming of the columns is not supported today in BigQuery refer here. Only options are recreate your table with the proper names (create a new table from your existing table) or recreate the table from your previous table