What does REPEATED field in Google Bigquery mean?

Question

Please check my understanding of REPEATED field in the following examples:

{
    "title": "History of Alphabet",
    "author": [
        {
            "name": "Larry"
        },
    ]
}

This JSON has schema:

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "RECORD",
        "fields": [
            {
                "name": "name",
                "type": "STRING"
            }
        ]
    }
]

But the following JSON

{
    "title": "History of Alphabet",
    "author": ["Larry", "Steve", "Eric"]
}

has schema:

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "STRING",
        "mode": "REPEATED"
    }
]

Is this correct?

nb: I tried to go through the documentation, but can't find any explanation about this.

Jeremy Condit Jeremy Condit · Accepted Answer · 2015-08-15T04:12:01

Close. In your first example, author is an array of objects, which corresponds to a repeated record in BQ. So the schema would be:

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "RECORD",
        "mode": "REPEATED",   <--- NOTE!
        "fields": [
            {
                "name": "name",
                "type": "STRING"
            }
        ]
    }
]

Your second data/schema pair looks good (but note that the overall schema is an array, not an object, and it needs commas between elements).

There is some discussion of nested and repeated fields here: https://cloud.google.com/bigquery/docs/data?hl=en#nested

There are also some sample JSON data objects here: https://cloud.google.com/bigquery/preparing-data-for-bigquery#dataformats

But I agree we don't do a good job of explaining how those objects map to BQ schemas. Sorry about that!

What does REPEATED field in Google Bigquery mean?

1 Answers