5
votes

We are loading csv files into BigQuery. Each file will create a separate table.

When we select from these tables, we do this mostly with a table query liek so:

SELECT foo, bar
FROM TABLE_QUERY(name_stub,'table_id CONTAINS "_something" and msec_to_timestamp(creation_time) > date_add(current_timestamp(), -90, "day")'));

Now we have added new fields to the newer files. So instead of just having fields "foo" and "bar" we now additionally have "baz".

When I run the following query, I get the error, that the field "baz" does not exists on one of the older tables.

SELECT foo, bar, baz
FROM TABLE_QUERY(name_stub,'table_id CONTAINS "_something" and msec_to_timestamp(creation_time) > date_add(current_timestamp(), -90, "day")'));

Is there a way to select "baz" and just have a default value for tables that don't have the column?

3

3 Answers

1
votes

Update: This issue has been addressed.

BigQuery now supports TABLE_DATE_RANGE over tables where a subset of the tables have a wider schema. The missing columns should appear as NULL in the tables where the column is missing.

https://code.google.com/p/google-bigquery/issues/detail?id=439

2
votes

Adding new columns to a table is possible (all historic data will automatically have NULLs), but you are creating new table every day - and TABLE_QUERY is just a shortcut syntax for UNION. The only idea I have is to create a view which will add column "baz" to the old tables. Then you should be able to use TABLE_QUERY across such views and newer tables.

1
votes

The workaround solution for this is stated in the answer to this question.

In my case it would have been:

SELECT foo, bar, coalesce(baz,0)
FROM TABLE_DATE_RANGE(mydataset.le_table,<beginning of time>,<day before column add>),
     TABLE_DATE_RANGE(mydataset.le_table,<day of column add>,<today>)