Multiple repeated structures in Bigquery

Question

Following up on this- Bigquery multiple unnest in a single select

We are using bigquery as our warehousing solution and are trying to push the limit by trying to consolidate. A simple example is client tracking. Client generates revenue, has several touch points on our site, and independently maintains several accounts with us. For a business user wanting to do behavior analysis on clients, they want to track visits, revenue generated and how their accounts impacT retention, we are trying to evaluate if a nested structure would work for us

Below is an example. I have 3 tables.

Clients (C)

C_Key| C_Name

-----|------

1 | ABC

2 | DEF

Accounts (A)

A_Key | C_Key
11 | 1
12 | 1
21 | 2
22 | 2

23 | 2

Revenue (R)

R_Key | C_Key | Revenue

-------|---------|----------

11 | 1 | $10

12 | 1 | $20

21 | 2 | $10

I used array_agg to combine these three into a single nested table that looks like below:

{Client,
    Accounts:
          [{
            }],
    Revenue:
          [{
              }]
  }

I want to be able to use multiple unnests in a single query like below

 Select client, Count Distinct(Accounts) and SUM(Revenue) from <single nested 
    table>, unnest accounts, unnest revenue

The expected output are 2 rows,

1,2,$30

2,3,$10

However, having multiple unnests in the same query results in a cross join.
The actual output is

1,2,$60

2,3,$30

Sorry, it's not clear what you are asking. Could you provide a specific example of what you want? stackoverflow.com/help/how-to-ask I can't tell if this question is about permissions, queries, or data modeling. — Tim Swast
Sorry, here goes. The question is primarily modeling a nested structure in bigquery. For e.g. 3 tables. Clients (C), Accounts (A), Revenue (R). (C) has one-to-many relationship with (A) and (R). My initial assumption was i could use a nested model to store all the above information and unnest them as needed using a single select. The 1:M relationship causes duplicates when I try Select client, Count Distinct(Accounts) and SUM(Revenue) from <single nested table>, unnest accounts, unnest revenue I get duplicates. I want to build a generic structure to allow adhoc queries. — Subah
Please edit your question (do not use comments for this!) to show a Minimal, Complete, and Verifiable example of the code that you are having problems with, then we can try to help with the specific problem. Provide data examples too. So we can actually help you! — Mikhail Berlyant

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2017-08-10T05:28:04

Below is for BigQuery Standard SQL

First let's clarify creation of single nested table

I hope you did something like :

#standardSQL
WITH clients AS (
  SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
  SELECT 2, 'def'
), accounts AS (
  SELECT 11 AS a_key, 1 AS c_key UNION ALL
  SELECT 12, 1 UNION ALL
  SELECT 21, 2 UNION ALL
  SELECT 22, 2 UNION ALL
  SELECT 23, 2
), revenue AS (
  SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
  SELECT 12, 1, 20 UNION ALL
  SELECT 21, 2, 10
), single_nested_table AS (
  SELECT x.c_key, x.c_name, accounts, revenue 
  FROM (
    SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue  
    FROM clients AS c
    LEFT JOIN accounts AS a ON a.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) x
  JOIN (
    SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue  
    FROM clients AS c
    LEFT JOIN revenue AS r ON r.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) y
  ON x.c_key = y.c_key
)
SELECT *
FROM single_nested_table

which creates table as

Row c_key c_name accounts.a_key accounts.c_key revenue.r_key revenue.c_key revenue.revenue
1   1     abc    11             1              11            1             10    
                 12             1              12            1             20    
2   2     def    21             2              21            2             10    
                 22             2                
                 23             2

Not that important what exactly query you used to create that table - but do important to clear the structure / schema!

So now, back to your question

#standardSQL
WITH clients AS (
  SELECT 1 AS c_key, 'abc' AS c_name UNION ALL
  SELECT 2, 'def'
), accounts AS (
  SELECT 11 AS a_key, 1 AS c_key UNION ALL
  SELECT 12, 1 UNION ALL
  SELECT 21, 2 UNION ALL
  SELECT 22, 2 UNION ALL
  SELECT 23, 2
), revenue AS (
  SELECT 11 AS r_key, 1 AS c_key, 10 AS revenue UNION ALL
  SELECT 12, 1, 20 UNION ALL
  SELECT 21, 2, 10
), single_nested_table AS (
  SELECT x.c_key, x.c_name, accounts, revenue 
  FROM (
    SELECT c.c_key, c_name, ARRAY_AGG(a) AS accounts --, array_agg(r) as revenue  
    FROM clients AS c
    LEFT JOIN accounts AS a ON a.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) x
  JOIN (
    SELECT c.c_key, c_name, ARRAY_AGG(r) AS revenue  
    FROM clients AS c
    LEFT JOIN revenue AS r ON r.c_key = c.c_key
    GROUP BY c.c_key, c_name
  ) y
  ON x.c_key = y.c_key
)
SELECT 
  c_key, c_name, 
  ARRAY_LENGTH(accounts) AS distinct_accounts, 
  (SELECT SUM(revenue) FROM UNNEST(revenue)) AS revenue
FROM single_nested_table

this gives what you asked for:

Row c_key   c_name  distinct_accounts   revenue  
1   1       abc     2                   30   
2   2       def     3                   10

Multiple repeated structures in Bigquery

1 Answers