This question is a more complex variant of this one I am trying to load event data into Bigquery where the JSON has the following structure:
{
"simpleKV1String": "Foo",
"simpleKV2String": "Bar",
"simpleKV3String": "qux",
"complexKV": {
"subKey1WithArrayInt": [1, 2, 3],
"subkey2": {
"subkey2Subkey1String": "corge",
"subkey2Subkey2String": "grault",
"subkey2Subkey3Int": 666
}
}
}
The idea is that the simpleKV column simply map 1 on 1 to a Bigquery String column For the complexKV 'column' however, we have 3 options:
- Keep the nested value of the complexKV key as a JSON blob in a BigQuery String field.
- Completely normalize it.
- Map it to a complex BiqQuery datatype like (again) is done in the other Stackoverflow example.
Our requirements:
- We already know the nested schema of complexKV is going to evolve: a subkey3 (and maybe 4) that has nested data itself will be added in the future.
- We want to minimize the changes on the BigQuery table (rooting for option 1).
- We want using the nested data to be as simple as possible (rooting for option 2 or 3).
As BigQuery does not support schema evolution AFAIK, I think we are left with option 1, which unfortunately makes using the data more complex...
Am I right on this or is there a smarter way to do this?