0
votes

I am trying to remove a column from a BigQuery table and I've followed the instructions as stated here:

https://cloud.google.com/bigquery/docs/manually-changing-schemas#deleting_a_column_from_a_table_schema

This did not work directly as the column I'm trying to remove is nested twice in a struct. The following SO questions are relevant but none of them solve this exact case.

  1. Single nested field: BigQuery select * except nested column

  2. Double nested field (solution has all fields in the schema enumerated, which is not useful for me as my schema is huge): BigQuery: select * replace from multiple nested column

I've tried adapting the above solutions and I think I'm close but can't quite get it to work.

This one will remove the field, but returns only the nested field, not the whole table (for the examples I want to remove a.b.field_name. See the example schema at the end):

SELECT AS STRUCT * EXCEPT(a), a.* REPLACE (
    (SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`

This next attempt gives me an error: Scalar subquery produced more than one element:

WITH a_tmp AS (
    SELECT AS STRUCT a.* REPLACE (
        (SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
    ) 
    FROM `table`
)

SELECT * REPLACE (
    (SELECT AS STRUCT a.* FROM a_tmp) AS a
)
FROM `table`

Is there a generalised way to solve this? Or am I forced to use the enumerated solution in the 2nd link?

Example Schema:

[
    {
        "name": "a",
        "type": "RECORD",
        "fields": [
            {
                "name": "b",
                "type": "RECORD"
                "fields": [
                    {
                        "name": "field_name",
                        "type": "STRING"
                    },
                    {
                        "name": "other_field_name".
                        "type": "STRING"
                    }
                ]
            },
        ]
    }
]

I would like the final schema to be the same but without field_name.

1
provide simplified example of your input data and desired result so we will be able to helpMikhail Berlyant
@MikhailBerlyant I have added an example schema. Is this what you are looking for?Julian Fell

1 Answers

2
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT * REPLACE(
  (SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
  AS a) 
FROM `project.dataset.table`  

you can test, play with it using dummy data as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT STRUCT<b STRUCT<field_name STRING, other_field_name STRING>>(STRUCT('1', '2')) a
)
SELECT * REPLACE(
  (SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
  AS a) 
FROM `project.dataset.table`