1
votes

Is there a simple way to delete a STRUCT from the nested and repeated field in the BigQuery (BQ table column Type: RECORD, Mode: REPEATED).

Let's say I have the following tables:

  1. wishlist
name    toy.id  toy.priority
Alice   1       high
        2       medium
        3       low
Kazik   3       high
        1       medium
  1. toys
id  name    available
1   car     0
2   doll    1
3   bike    1

I'd like to DELETE from wishlist toys that are not available (toys.available==0). In this case, it's toy.id==1.

As a result, the wishlist would look like this:

name    toy.id  toy.priority
Alice   2       medium
        3       low
Kazik   3       high

I know how to select it:

WITH `project.dataset.wishlist` AS
(
  SELECT 'Alice' name, [STRUCT<id INT64, priority STRING>(1, 'high'), (2, 'medium'), (3, 'low')] toy UNION ALL
  SELECT 'Kazik' name, [STRUCT<id INT64, priority STRING>(3, 'high'), (1, 'medium')]
), toys AS (
  SELECT 1 id, 'car' name, 0 available UNION ALL
  SELECT 2 id, 'doll' name, 1 available UNION ALL
  SELECT 3 id, 'bike' name, 1 available
)
SELECT wl.name, ARRAY_AGG(STRUCT(unnested_toy.id, unnested_toy.priority)) as toy
FROM `project.dataset.wishlist` wl, UNNEST (toy) as unnested_toy
LEFT JOIN toys t ON unnested_toy.id=t.id
WHERE t.available != 0
GROUP BY name

But I don't know how to remove structs <toy.id, toy.priority> from wishlist when toys.available==0.

There are very similar questions like How to delete/update nested data in bigquery or How to Delete rows from Structure in bigquery but the answers are either unclear to me in terms of deletion or suggest copying the whole wishlist to the new table using the selection statement. My 'wishlist' is huge and 'toys.availabililty' changes often. Copying it seems to me very inefficient.

Could you please suggest a solution aligned with BQ best practices?

Thank you!

2
BigQuery is not very efficient with frequently changing data. It is mostly designed for append-only data tables. Maybe a better fit would be Bigtable instead.Sergey Geron
the query in the question does exactly what you ask for - what the problem here? please clarify!! If you meant physically delete those structs in source table - then it does not make much sense what if next time that removed from wishlist toy will become available?! anyway - please clarify what exactly you want to accomplishMikhail Berlyant
@MikhailBerlyant please do not hold onto the data that much. Knowing how to delete certain structs would help me a lot. It's because in the real scenario in the wishlist I have predictions based on different ML models. Structs contain references to prediction groups/descriptions (that are in the toys list) that a sample belongs to. But some models are refreshed in certain conditions and a as result some of those ID's assignments are invalid. So certain Id should be removed from structs beforehand. They are incorrect, slowing down queries, and has a non-zero carbon footprint :) ThanksLukiz
Thanks @SergeyGeron. I was afraid that it will be the conclusion since indeed It's one of the BQ design principles. However since row Deletion was implemented in BQ, I thought that STRUCT deletion inside a row is also possible.Lukiz

2 Answers

1
votes

... since row Deletion was implemented in BQ, I thought that STRUCT deletion inside a row is also possible.

You can use UPDATE DML for this (not DELETE as it is used for deletion of whole row(s), while UPDATE can be used to modify the row)

update `project.dataset.wishlist` wl
set toy = ((
  select array_agg(struct(unnested_toy.id, unnested_toy.priority)) 
  from unnest(toy) as unnested_toy
  left join `project.dataset.toys` t on unnested_toy.id=t.id
  where t.available != 0
))
where true;   
0
votes

You can UNNEST() and reaggregate:

SELECT wl.name, 
       (SELECT ARRAY_AGG(t)
        FROM UNNEST(wl.toy) t JOIN
             toys
             ON toys.id = t.id
        WHERE toys.available <> 0
       ) as available_toys
FROM `project.dataset.wishlist` wl;