I have a table that looks like this:
word nb_by_date.date nb_by_date.nb
---------------------------------------
abc 2020-01-01 17
2020-01-06 43
abc 2020-01-01 33
2020-01-05 12
2020-01-06 5
def 2020-01-02 11
2020-01-05 8
def 2020-01-02 1
You can use the following to obtain this example:
WITH t AS (
SELECT "abc" AS word, [STRUCT('2020-01-01' AS date, 17 AS nb), STRUCT('2020-01-06' AS date, 43 AS nb)]
UNION ALL SELECT "abc" AS word, [STRUCT('2020-01-01' AS date, 33 AS nb), STRUCT('2020-01-05' AS date, 12 AS nb), STRUCT('2020-01-06' AS date, 5 AS nb)]
UNION ALL SELECT "def" AS word, [STRUCT('2020-01-02' AS date, 11 AS nb), STRUCT('2020-01-05' AS date, 8 AS nb)]
UNION ALL SELECT "def" AS word, [STRUCT('2020-01-02' AS date, 1 AS nb)]
)
My goal is to get:
word nb_by_date.date nb_by_date.nb
---------------------------------------
abc 2020-01-01 50
2020-01-05 12
2020-01-06 55
def 2020-01-02 22
2020-01-05 8
And here is my attempt:
SELECT
word,
ARRAY(
SELECT STRUCT(date, SUM(nb))
FROM UNNEST(nb_by_date)
GROUP BY date
ORDER BY date) nb_by_date
FROM (
SELECT word, ARRAY_CONCAT_AGG(nb_by_date) nb_by_date
FROM t
GROUP BY word
)
It works for this toy example. However, I have a high amount of data and the use of ARRAY_CONCAT_AGG(nb_by_date)
creates a row that goes beyond the 100MB limit (Cannot query rows larger than 100MB limit.).
How could I adapt the query to make it work even with a lot of data?