BigQuery - Summing on arrays of structs

Question

I have a table that looks like this:

word   nb_by_date.date    nb_by_date.nb
---------------------------------------
abc    2020-01-01         17
       2020-01-06         43
abc    2020-01-01         33
       2020-01-05         12
       2020-01-06         5
def    2020-01-02         11
       2020-01-05         8
def    2020-01-02         1

You can use the following to obtain this example:

WITH t AS (
SELECT "abc" AS word, [STRUCT('2020-01-01' AS date, 17 AS nb), STRUCT('2020-01-06' AS date, 43 AS nb)]
UNION ALL SELECT "abc" AS word, [STRUCT('2020-01-01' AS date, 33 AS nb), STRUCT('2020-01-05' AS date, 12 AS nb), STRUCT('2020-01-06' AS date, 5 AS nb)]
UNION ALL SELECT "def" AS word, [STRUCT('2020-01-02' AS date, 11 AS nb), STRUCT('2020-01-05' AS date, 8 AS nb)]
UNION ALL SELECT "def" AS word, [STRUCT('2020-01-02' AS date, 1 AS nb)]
)

My goal is to get:

word   nb_by_date.date    nb_by_date.nb
---------------------------------------
abc    2020-01-01         50
       2020-01-05         12
       2020-01-06         55
def    2020-01-02         22
       2020-01-05         8

And here is my attempt:

SELECT
  word,
  ARRAY(
  SELECT STRUCT(date, SUM(nb))
  FROM UNNEST(nb_by_date)
  GROUP BY date
  ORDER BY date) nb_by_date
FROM (
  SELECT word, ARRAY_CONCAT_AGG(nb_by_date) nb_by_date
  FROM t
  GROUP BY word
)

It works for this toy example. However, I have a high amount of data and the use of ARRAY_CONCAT_AGG(nb_by_date) creates a row that goes beyond the 100MB limit (Cannot query rows larger than 100MB limit.). How could I adapt the query to make it work even with a lot of data?

Gordon Linoff Gordon Linoff · Accepted Answer · 2020-07-20T15:15:32

You can use two levels of aggregation:

WITH t AS (
      SELECT 'abc' AS word, [STRUCT('2020-01-01' AS date, 17 AS nb), STRUCT('2020-01-06' AS date, 43 AS nb)] as ar UNION ALL
      SELECT 'abc' AS word, [STRUCT('2020-01-01' AS date, 33 AS nb), STRUCT('2020-01-05' AS date, 12 AS nb), STRUCT('2020-01-06' AS date, 5 AS nb)] UNION ALL
      SELECT 'def' AS word, [STRUCT('2020-01-02' AS date, 11 AS nb), STRUCT('2020-01-05' AS date, 8 AS nb)] UNION ALL
      SELECT 'def' AS word, [STRUCT('2020-01-02' AS date, 1 AS nb)]
     )
select t.word, array_agg(struct( date, nb) order by date) as ar
from (select t.word, el.date, sum(el.nb) as nb
      from t cross join
           unnest(t.ar) el
      group by t.word, el.date
     ) t
group by word

BigQuery - Summing on arrays of structs

1 Answers