5
votes

I have some code that automatically generates a bunch of different SQL queries that I would like to insert into the bigquery to generate views, though one of the issues that I have is that these views need to be generated dynamically every night because of the changing nature of the data. So what I would like to be able to do is use the google bigquery api for python to be able to make a view. I understand how to do it using the 'bq' command line tool, but I'd like to be able to have this built directly into the code as opposed to using a shell to run bq. I have played with the code provided at

https://cloud.google.com/bigquery/bigquery-api-quickstart

I don't understand how to use this bit of code to create a view instead of just returning the results of a SELECT statement. I can see the documentation about doing table inserts here

https://cloud.google.com/bigquery/docs/reference/v2/tables/insert

but that refers to using the REST API to generate new tables as opposed to the example provided above.

Is it just not possible? Should I just give in and use bq?

Thanks

*** Some additional questions in response to Felipe's comments.

The table resource document indicates that there are a number of required fields, some of which make sense even if I don't fully understand what they're asking for, others do not. For example, externalDataConfiguration.schema. Does this refer to the schema for the database that I'm connecting to (I assume it does), or the schema for storing the data?

What about externalDataConfiguration.sourceFormat? Since I'm trying to make a view of a pre-existing database, I'm not sure I understand how the source format is relevant. Is it the source format of the database I'm making a view from? How would I identify that?

ANd externalDataConfiguration.sourceUris[], I'm not importing new data into the database, so I don't understand how this (or the previous element) are required.

What about schema?

tableReference.datasetId, tableReference.projectId, and tableReference.tableId are self explanatory.

Type would be view, and view.query would be the actual sql query used to make the view. So I get why those are required for making a view, but I don't understand the other parts.

Can you help me understand these details?

Thanks, Brad

4

4 Answers

3
votes

Everything that web UI or the bq tool does is made through the BigQuery API, so don't give up yet :).

Creating a view is akin to creating a table, just be sure to have a table resource that contains a view property when you call tables.insert().

7
votes

Using https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert

Submit something like below, assuming you add the authorization

{
  "view": {
    "query": "select column1, count(1) `project.dataset.someTable` group by 1",
    "useLegacySql": false
  },
  "tableReference": {
    "tableId": "viewName",
    "projectId": "projectName",
    "datasetId": "datasetName"
  }
}

Alternatively in Python using, assuming you have a service key setup and the environmental variable GOOGLE_APPLICATION_CREDENTIALS=/path/to/my/key. The one caveat is that as far as I can tell this can only create views using legacy sql, and as an extension can only be queried using legacy sql, though the straight API method allows legacy or standard.

from google.cloud import bigquery

def create_view(dataset_name, view_name, project, viewSQL):

    bigquery_client = bigquery.Client(project=project)

    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(view_name)

    table.view_query = viewSQL

    try:
        table.create()
        return True
    except Exception as err:
        print(err)
        return False
5
votes

Note: this changed a little bit with 0.28.0 of the library - see the following for further details: Google BigQuery: creating a view via Python google-cloud-bigquery version 0.27.0 vs. 0.28.0

my example function

# create a view via python
def create_view(dataset_name, view_name, sqlQuery, project=None):
    try:

        bigquery_client = bigquery.Client(project=project)
        dataset_ref = bigquery_client.dataset(dataset_name)
        table_ref = dataset_ref.table(view_name)
        table = Table(table_ref)
        table.view_query = sqlQuery
        table.view_use_legacy_sql = False
        bigquery_client.create_table(table)

        return True

    except Exception as e:
        errorStr = 'ERROR (create_view): ' + str(e)
        print(errorStr)
        raise
3
votes

bigquery.version -> '1.10.0'

def create_view(client, dataset_name, view_name, view_query):
    try:
        dataset_ref = client.dataset(dataset_name)
        view = dataset_ref.table(view_name)
        # view.table_type = 'VIEW'
        view.view_query = view_query
        view.view_query_legacy_sql  = False
        client.create_table(view)
        pass
    except Exception as e:
        errorStr = 'ERROR (create_view): ' + str(e)
        print(errorStr)
        raise

create a table not a view !!!!

This is the right code to create a view:

def create_view(client, dataset_name, view_name, view_query):
    try:
        dataset_ref = client.dataset(dataset_name)
        view_ref = dataset_ref.table(view_name)
        table = bigquery.Table(view_ref)
        table.view_query = view_query
        table.view_use_legacy_sql = False
        client.create_table(table)
    except Exception as e:
        errorStr = 'ERROR (create_view): ' + str(e)
        print(errorStr)
        raise

Is necessary

table = bigquery.Table(view_ref)