0
votes

I have a table in BigQuery and want to change the mode of a column from NULLABLE to REQUIRED.

According to the Google documentation for modifying BigQuery table schemas the only way to do this goes as follows: "You can manually change a column's mode by exporting your table data to Cloud Storage, and then loading the data into a new table with a schema definition that specifies the correct mode for the column. You can also use the load job to overwrite the existing table." (https://cloud.google.com/bigquery/docs/manually-changing-schemas#changing_a_columns_mode).

I have 2 questions about that.

  1. What does this last sentence "You can also use the load job to overwrite the existing table" mean? Does this mean I can somehow make this schema change in the existing table and don't actually have to make a new table?

  2. I want to avoid creating a new table and just make schema changes to my existing table. Is there a way to change the mode of a column from NULLABLE to REQUIRED in my existing table? For other modifications like deleting a column from a table schema or changing a column's data type it is possible to use a query job for that (option 1 in the doc). Is there also a query job I can do to change a column's mode? BigQuery does not seem to support something like "ALTER TABLE [Table] ALTER COLUMN [Column] INTEGER NOT NULL".

Any help is appreciated!

2

2 Answers

0
votes
  1. Given the documentation no you will have to create a new table in that scenario.
  2. There is no way to change the definition of a column (data type or mode)

What you could alternatively do is add a new column that has the new definition and update the column with the values from the old column.

See Mikhail's answer to this post: Big Query - Convert a int column into float

0
votes

The important part of the documentation you need to read is Modifying Table schemas, more precisely the section Relaxing a column mode. In there you can see this two key lines:

  • Currently, the only supported modification you can make to a column's mode is changing it from REQUIRED to NULLABLE.

  • Caution: Changing a REQUIRED column to NULLABLE cannot be undone.

Meaning, there is no way to change a NULLABLE column to REQUIRED but you can change a REQUIRED column to NULLABLE.

I tried anyway to proceed with the change, all schema changes, as shown in the aforementioned official documentation can be done using bq update, calling table.patch method or with a client library. I described it more extensively in this other post. In any case, when modifying the column in the schema from NULLABLE to REQUIRED and trying to update, one is met with the error:

BigQuery error in update operation: Provided Schema does not match Table <project-id>:<dataset-id>.<table-id>. Field <field-name> has changed mode from NULLABLE to REQUIRED

Confirming this is not supported.

However, you can follow your workaround mentioned in your first question, i.e. loading the data overwriting the same named table.

Basically, the steps are:

  1. Export to GCS
  2. Load the data overwriting the same table

I performed a quick experiment to double-check it works properly.

  1. I exported the data to a CSV file in GCS using the UI.
  2. I overwrote the table following Appending to or overwriting a table with CSV data. Basically I used the command:
bq load \
--source_format=CSV \
--replace \
--max_bad_records=3 \
<dataset-id>.<table-id> \
gs://<MyBucket>/<myexportedfile> \
<mymodifiedschemafile>

As a result I had the same table I had before but with the field changed from NULLABLE to REQUIRED, confirming this workaround works properly!