I'm working in the BigQuery Standard SQL dialect.
I have a column that I know to be an array of JSON dictionaries.
Array length is variable from row to row.
I'd like to flatten this so I can access the JSON elements for each dictionary in the arrays.
For example, let's say I have two records. The first has an id of 1, and in the JSON column, has this
[
{"key1":"val1a", "key2": "val1b"},
{"key1":"val1c", "key2": "val1d"}
]
The second has an id of 2, and in the JSON column has
[{"key1":"val2a", "key2":"val2b"}]
My goal is
id | key1 | key2 | offset
---------------------------
1 | val1a | val1b | 1
1 | val1c | val1d | 2
2 | val2a | val2b | 1
(Though I could live without the offset column)
It seems like something like this could work...
WITH table AS (
SELECT 1 as id,['{"key1":"val1a", "key2": "val1b"}','{"key1":"val1c", "key2": "val1d"}'] as array_column
UNION ALL
SELECT 2 as id,['{"key1":"val2a", "key2":"val2b"}'] as array_column)
SELECT id,
json_extract_scalar(flattened_array, '$.key1') as key1,
json_extract_scalar(flattened_array, '$.key2') as key2
FROM table t
CROSS JOIN UNNEST(t.array_column) AS flattened_array
And in fact, that query returns the table I expect (minus the offset column, which is trivial to add)
The problem is that BigQuery doesn't understand that this thing is an array of JSON-like strings. It thinks the whole thing is one big string, and I don't know how to convince it otherwise. Editing my example to simulate this type confusion demonstrates the problem:
WITH table AS (
SELECT 1 as id,'[{"key1":"val1a", "key2": "val1b"},{"key1":"val1c", "key2": "val1d"}]' as array_column
UNION ALL
SELECT 2 as id,'[{"key1":"val2a", "key2":"val2b"}]' as array_column)
SELECT id,
json_extract_scalar(flattened_array, '$.key1') as key1,
json_extract_scalar(flattened_array, '$.key2') as key2
FROM table t
CROSS JOIN UNNEST(t.array_column) AS flattened_array
Here, the validator complains because Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRING at [29:23]
And now we're at the heart of the issue. Is there some obvious way to get BigQuery to understand that this string is a valid array of JSON dictionaries? Maybe some JSON_* function I've overlooked that will flatten the array? Or some way to CAST this thing to an array?
