0
votes

I have 2 tables:

To keep problem definition simple, here is the schema for 1st:

student_id int
phones repeated 
- phones.number string
- phones.type string

2nd table:

student_id int
courses repeated 
- courses.id int
- courses.name string

Both tables have same number of rows and same student ids. All I need is to combine both repeated records into one master student table: (keeping 2 distinct repeating fields) something like:

student_id int
phones repeated 
- phones.number string
- phones.type string
courses repeated 
- courses.id int 
- courses.name string

How can i do this in bigquery ? (I tried a bunch of approaches, but all ended up creating repeating rows for repeated fields. It would be good to get a fresh perspective from bigquery masters on stackoverflow). Thanks in advance.

2

2 Answers

2
votes

You'll want to JOIN the two data sets and select the relevant columns from each. It's easier to set up an example using standard SQL (uncheck "Use Legacy SQL" under "Show Options") but a similar idea applies to legacy SQL.

WITH Students AS (
  SELECT
    1 AS student_id,
    ARRAY<STRUCT<number STRING, type STRING>>[
      STRUCT("(555) 555-5555", "cell")] AS phones
  UNION ALL SELECT
    5 AS student_id,
    ARRAY<STRUCT<number STRING, type STRING>>[
      STRUCT("(555) 555-1234", "home"),
      STRUCT("(555) 555-4321", "cell")] AS phones
),
Courses AS (
  SELECT
    5 AS student_id,
    ARRAY<STRUCT<id INT64, name STRING>>[
      STRUCT(10, "Data Analysis")] AS courses
  UNION ALL SELECT
    1 AS student_id,
    ARRAY<STRUCT<id INT64, name STRING>>[
      STRUCT(10, "Data Analysis"),
      STRUCT(101, "Algorithms")] AS courses
)
SELECT
  student_id,
  phones,
  courses
FROM Students
JOIN Courses
USING (student_id);

Legacy SQL would use something like:

SELECT
  s.student_id AS student_id,
  s.phones.number,
  s.phones.type,
  c.courses.id,
  c.courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id;
0
votes

for Legacy SQL

SELECT 
   s.student_id AS student_id,
   phones.number,
   phones.type,
   courses.id,
   courses.name
FROM Students s
JOIN Courses c
ON s.student_id = c.student_id

Note: you will need to check Allow Large Results checkbox and uncheck Flatten Results checkbox with save result into table to preserve schema