Whats the best method of making arrays of joined tables in Big Query?

Question

I'm trying to do some data transformation inside of Big Query, with SQL.

Let's say I have three tables:

Customer - data about the customer, like age, etc Subscriptions - data about what subscriptions the user have Engagements - data about how the customers have interacted with digital products.

I'd like to collect this inside of one table, using nested fields.

I can join all these tables, but I'd like to aggregate them to arrays.

So, instead of three tables I get this:

id:123,
name:David,
age:30,
subscritions: [{
  name:sub1
  price:10
},
{
  name:sub2
  price:20
}],
engagment: [{
  event:visited_product_x
  time:2020-06-10
},
{
  event:visited_product_y
  time:2020-06-10
}]

Of course I've used array_agg in SELECT. And that works great, when adding only one table. However, when adding another one, I get duplicate rows, which I don't want. So, I guess I shouldn't use array_agg in SELECT, but rather somewhere else.

But whats the best way of solving this?

Gordon Linoff Gordon Linoff · Accepted Answer · 2020-06-10T10:52:59

You can use subqueries to construct the fields. Something like this:

select c.*,
       (select array_agg(s)
        from substriptions s
        where s.user_id = c.user_id
       ) as subscriptions,
       (select array_agg(e)
        from engagements e
        where e.user_id = c.user_id
       ) as engagements
from customers c

Whats the best method of making arrays of joined tables in Big Query?

1 Answers