0
votes

I have a BigQuery table with streaming data. The table is being populated using a Dataflow job. Recently I updated my Dataflow pipeline by removing Column_B of the two columns shown below:

| Column_A | Column_B  |
|----------|-----------|
| Anna     | Chicago   |
| John     | Houston   |

But now my updated table contains the same number of columns as before, but with new data intended for Column_B replaced with null. Here´s an example of my updated pipeline:

| Column_A | Column_B  |
|----------|-----------|
| Anna     | Chicago   |
| John     | Houston   |
| Michael  | null      |
| Cecilia  | null      |
| Ronald   | null      |

My table is partitioned on timestamp. I am wondering if there´s a way to completely drop Column_B and looking for suggestions regarding how to (if I should) do that. Also, how would that impact my table.

Thanks in advance.

1

1 Answers

0
votes

For simplicity assume your current table is named as table_name.

STEP 1. In the query settings select following option:

Set a destination table for query results

Step 2. Run following query to save the result set as a table:

SELECT *
EXCEPT(Column_B)
FROM table_name

Table created in Step 2 is named as table_name_modified. This table will act as a backup for your data.

Step 3. Drop table_name. Once you drop the table_name rename table_name_modified to table_name.

Now you have updated the table to exclude Column_B Dataflow won't be populating nulls anymore.