13
votes

What I want to do should be simple: Given a BigQuery schema, I want to select all tables (including nested ones) apart from a few. The tricky thing is that BigQuery has a nested structure and the few I want to exclude are nested within other records.

I've found the SELECT * except clause in the BigQuery documentation which seems very promising. The problem is that it doesn't seem to support the nested structure exclusion.

For example, using the public github_nested dataset we can write a query like

#standardSQL
SELECT * except (payload) FROM `bigquery-public-data.samples.github_nested` LIMIT 1000

This does what we expect successfully by removing the payload record from the results. Let's imagine now that we only want to remove payload.comment, thereby preserving the rest of the payload record contents in the response. I tried

#standardSQL
SELECT * except (payload.comment) FROM `bigquery-public-data.samples.github_nested` LIMIT 1000

However, this fails.

Anyone know of a way to accomplish this?

Thanks!

2

2 Answers

22
votes

The way to think of the problem is that you still want a payload column in the result, but you want it to have a different structure, namely to exclude comment. In this case, you can use SELECT * REPLACE to make the modification. For example,

#standardSQL
SELECT * REPLACE ((SELECT AS STRUCT payload.* EXCEPT (comment)) AS payload)
FROM `bigquery-public-data.samples.github_nested`
LIMIT 1000;
-1
votes

This too works:

#standardSQL
SELECT 
  * 
  EXCEPT (payload),
  payload.desc AS payload_desc,  
  payload.action AS payload_action
FROM `bigquery-public-data.samples.github_nested`