1
votes

I am trying to join a table of some project data (table1) with a nested array of project id's onto another table with project data (table2) (order is important here)

table1
proj_date     num_proj_per_day     proj_size     proj_id(nested)
1/1/2020      4                    150           a123
                                                 b456
                                                 c789
table2
proj_id(not nested)    proj_loc       lots_of_other_proj_fields....
a123                   Los Angeles
b456                   New York
c798                   Los Angeles
d012                   Denver
....                   ....

desired outcome

proj_date     num_proj_per_day     proj_size     proj_id(unnested)   pro_loc    
1/1/2020      4                    150           a123                Los Angeles
1/1/2020      4                    150           b456                New York
1/1/2020      4                    150           c789                Los Angeles

I have been able to achieve this outcome if I write the sql code with table1 as the from and then cross join unnest(proj_id) and then left join table2. The problem is i need to have table2 in the from statement then join table1 on the unnested(proj_id). Order unfortuantely matters because I have to merge this new dataset(table1) into existing dataset/framework(table2) within Looker

Example of what works to get the correct outcome but does not work for my application

SELECT 
 table1.*,
 table2.proj_loc
FROM table1
CROSS JOIN UNNEST(table1.proj_id) as unnested
LEFT JOIN table2
 ON table2.proj_id = unnested.proj_id

I am looking for something like below but you can not put the unnest into the ON clause - bigquery pops error "Unexpected keyword UNNEST"

SELECT 
 table1.*,
 table2.proj_loc
FROM table2
LEFT JOIN table1
 ON UNNEST(table1.proj_id)=table2.proj_id

Thank you in advance and let me know if you need anymore clarifying information

1

1 Answers

2
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT proj_date, num_proj_per_day, proj_size, t2.*
FROM `project.dataset.table2` t2
JOIN `project.dataset.table1` t1 
ON t2.proj_id IN UNNEST(t1.proj_id)   

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table1` AS (
  SELECT DATE '2020-01-01' proj_date, 4 num_proj_per_day, 150 proj_size, ['a123','b456','c789'] proj_id
),`project.dataset.table2` AS (
  SELECT 'a123' proj_id, 'Los Angeles' proj_loc, 1 proj_field1, 2 proj_field2, 3 proj_field3 UNION ALL
  SELECT 'b456', 'New York', 21, 22, 23 UNION ALL
  SELECT 'c789', 'Los Angeles', 31, 32, 33 UNION ALL
  SELECT 'd012', 'Denver', 41, 42, 43 
)
SELECT proj_date, num_proj_per_day, proj_size, t2.*
FROM `project.dataset.table2` t2
JOIN `project.dataset.table1` t1 
ON t2.proj_id IN UNNEST(t1.proj_id)   

with output

Row proj_date   num_proj_per_day    proj_size   proj_id proj_loc    proj_field1 proj_field2 proj_field3  
1   2020-01-01  4                   150         a123    Los Angeles 1           2           3    
2   2020-01-01  4                   150         b456    New York    21          22          23   
3   2020-01-01  4                   150         c789    Los Angeles 31          32          33