Is there a simple way to delete a STRUCT from the nested and repeated field in the BigQuery (BQ table column Type: RECORD, Mode: REPEATED).
Let's say I have the following tables:
- wishlist
name toy.id toy.priority
Alice 1 high
2 medium
3 low
Kazik 3 high
1 medium
- toys
id name available
1 car 0
2 doll 1
3 bike 1
I'd like to DELETE from wishlist toys that are not available (toys.available==0). In this case, it's toy.id==1.
As a result, the wishlist would look like this:
name toy.id toy.priority
Alice 2 medium
3 low
Kazik 3 high
I know how to select it:
WITH `project.dataset.wishlist` AS
(
SELECT 'Alice' name, [STRUCT<id INT64, priority STRING>(1, 'high'), (2, 'medium'), (3, 'low')] toy UNION ALL
SELECT 'Kazik' name, [STRUCT<id INT64, priority STRING>(3, 'high'), (1, 'medium')]
), toys AS (
SELECT 1 id, 'car' name, 0 available UNION ALL
SELECT 2 id, 'doll' name, 1 available UNION ALL
SELECT 3 id, 'bike' name, 1 available
)
SELECT wl.name, ARRAY_AGG(STRUCT(unnested_toy.id, unnested_toy.priority)) as toy
FROM `project.dataset.wishlist` wl, UNNEST (toy) as unnested_toy
LEFT JOIN toys t ON unnested_toy.id=t.id
WHERE t.available != 0
GROUP BY name
But I don't know how to remove structs <toy.id, toy.priority> from wishlist when toys.available==0.
There are very similar questions like How to delete/update nested data in bigquery or How to Delete rows from Structure in bigquery but the answers are either unclear to me in terms of deletion or suggest copying the whole wishlist to the new table using the selection statement. My 'wishlist' is huge and 'toys.availabililty' changes often. Copying it seems to me very inefficient.
Could you please suggest a solution aligned with BQ best practices?
Thank you!
wishlist
I have predictions based on different ML models. Structs contain references to prediction groups/descriptions (that are in thetoys
list) that a sample belongs to. But some models are refreshed in certain conditions and a as result some of those ID's assignments are invalid. So certain Id should be removed from structs beforehand. They are incorrect, slowing down queries, and has a non-zero carbon footprint :) Thanks – Lukiz