I'm trying to unnest multiple nested arrays in Bigquery, filter on them and put the new arrays back together. My problem is, that I'm ending up with duplicated values.
Example data:
The image has two example rows of data with an array "vendor" that contains the two arrays "topic" and "categories"
I want to filter on vendor.topic.score >= 0.8, vendor.categories.score >= 0.8 and also get rid of the vendor.topic.position column.
The result should look like this:
First I've tried to solve it with multiple unnest for each array, but this gives me duplicated values in the newly created arrays:
SELECT
id,
ARRAY_AGG(STRUCT(vendor_topics.label AS topics_label,
vendor_topics.score AS topics_score)),
ARRAY_AGG(STRUCT(vendor_categories.label AS category_label,
vendor_categories.score AS category_score))
FROM
`source_table`,
UNNEST(vendor.topics) vendor_topics,
UNNEST(vendor.categories) vendor_categories
WHERE
AND vendor_categories.score >= 0.8
AND vendor_topics.score >= 0.8
GROUP BY
1
LIMIT
10
Next I've tried using subqueries, which ended up in "API limit exceeded: Unable to return a row that exceeds the API limits. To retrieve the row, export the table."
SELECT
id,
(
SELECT
ARRAY_AGG(STRUCT(vendor_topics.label AS topics_label,
vendor_topics.score AS topics_score))
FROM
`source_table` articles,
UNNEST(vendor.topics) vendor_topics
WHERE
vendor_topics.score >= 0.8),
(
SELECT
ARRAY_AGG(STRUCT(vendor_categories.label AS category_label,
vendor_categories.score AS category_score))
FROM
`source_table`,
UNNEST(vendor.categories) vendor_categories
WHERE
vendor_categories.score >= 0.8)
FROM
`source_table`
GROUP BY
1
Now I'm out of ideas and hoping someone can help me solve this please.