25
votes

Please check my understanding of REPEATED field in the following examples:

{
    "title": "History of Alphabet",
    "author": [
        {
            "name": "Larry"
        },
    ]
}

This JSON has schema:

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "RECORD",
        "fields": [
            {
                "name": "name",
                "type": "STRING"
            }
        ]
    }
]

But the following JSON

{
    "title": "History of Alphabet",
    "author": ["Larry", "Steve", "Eric"]
}

has schema:

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "STRING",
        "mode": "REPEATED"
    }
]

Is this correct?

nb: I tried to go through the documentation, but can't find any explanation about this.

1

1 Answers

27
votes

Close. In your first example, author is an array of objects, which corresponds to a repeated record in BQ. So the schema would be:

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "RECORD",
        "mode": "REPEATED",   <--- NOTE!
        "fields": [
            {
                "name": "name",
                "type": "STRING"
            }
        ]
    }
]

Your second data/schema pair looks good (but note that the overall schema is an array, not an object, and it needs commas between elements).

There is some discussion of nested and repeated fields here: https://cloud.google.com/bigquery/docs/data?hl=en#nested

There are also some sample JSON data objects here: https://cloud.google.com/bigquery/preparing-data-for-bigquery#dataformats

But I agree we don't do a good job of explaining how those objects map to BQ schemas. Sorry about that!