We have a weekly backup process which exports our production Google Appengine Datastore onto Google Cloud Storage, and then into Google BigQuery. Each week, we create a new dataset named like YYYY_MM_DD
that contains a copy of the production tables on that day. Over time, we have collected many datasets, like 2014_05_10
, 2014_05_17
, etc. I want to create a data set Latest_Production_Data
that contains a view for each of the tables in the most recent YYYY_MM_DD
dataset. This will make it easier for downstream reports to write their query once and always retrieve the most recent data.
To do this, I have code that gets the most recent dataset and the names of all the tables that dataset contains from the BigQuery API. Then, for each of these tables, I fire a tables.insert call to create a view that is a SELECT *
from the table I am looking to create a reference to.
This fails for tables that contain a RECORD
field, from what looks to be a pretty benign column-naming rule.
For example, I have this table:
For which I issue this API call:
{
'tableReference': {
'projectId': 'redacted',
'tableId': u'AccountDeletionRequest',
'datasetId': 'Latest_Production_Data'
}
'view': {
'query': u'SELECT * FROM [2014_05_17.AccountDeletionRequest]'
},
}
This results in the following error:
HttpError: https://www.googleapis.com/bigquery/v2/projects//datasets/Latest_Production_Data/tables?alt=json returned "Invalid field name "__key__.namespace". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.">
When I execute this query in the BigQuery web console, the columns are renamed to translate the .
to an _
. I kind of expected the same thing to happen when I issued the create view API call.
Is there an easy way I can programmatically create a view for each of the tables in my dataset, regardless of their underlying schema? The problem I'm encountering now is for record columns, but another problem I anticipate is for tables that have repeated fields. Is there some magic alternative to SELECT *
that will take care of all these intricacies for me?
Another idea I had was doing a table copy, but I would prefer not to duplicate the data if I can at all avoid it.
SELECT *
. Can you try this once with theSELECT
statement having all field names explicitly mentioned? – Rohit