0
votes

I'll preface this by saying I'm fairly new to BigQuery. I'm running into an issue when trying to schedule a query using the Python SDK. I used the example on the documentation page and modified it a bit but I'm running into errors.

Note that my query does use scripting to set some variables, and it's using a MERGE statement to update one of my tables. I'm not sure if that makes a huge difference.

def create_scheduled_query(dataset_id, project, name, schedule, service_account, query):
    parent = transfer_client.common_project_path(project)

    transfer_config = bigquery_datatransfer.TransferConfig(
        destination_dataset_id=dataset_id,
        display_name=name,
        data_source_id="scheduled_query",
        params={
            "query": query

        },
        schedule=schedule,
    )

    transfer_config = transfer_client.create_transfer_config(
        bigquery_datatransfer.CreateTransferConfigRequest(
            parent=parent,
            transfer_config=transfer_config,
            service_account_name=service_account,
        )
    )

    print("Created scheduled query '{}'".format(transfer_config.name))

I was able to successfully create a query with the function above. However the query errors out with the following message:

Error code 9 : Dataset specified in the query ('') is not consistent with Destination dataset '{my_dataset_name}'.

I've tried changing passing in "" as the dataset_id parameter, but I get the following error from the Python SDK:

google.api_core.exceptions.InvalidArgument: 400 Cannot create a transfer with parent projects/{my_project_name} without location info when destination dataset is not specified.

Interestingly enough I was able to successfully create this scheduled query in the GUI; the same query executed without issue.

I saw that the GUI showed the scheduled query's "Resource name" referenced a transferConfig, so I used the following command to see what that transferConfig looked like, to see if I could apply the same parameters using my Python script:

bq show --format=prettyjson --transfer_config {my_transfer_config}

Which gave me the following output:

{
  "dataSourceId": "scheduled_query",
  "datasetRegion": "us",
  "destinationDatasetId": "",
  "displayName": "test_scheduled_query",
  "emailPreferences": {},
  "name": "{REDACTED_TRANSFER_CONFIG_ID}",
  "nextRunTime": "2021-06-18T00:35:00Z",
  "params": {
    "query":  ....

So it looks like the GUI was able to use "" for destinationDataSetId but for whatever reason the Python SDK won't let me use that value.

Any help would be appreciated, since I prefer to avoid the GUI whenever possible.

UPDATE:

This does appear to be related to the scripting I used in my query. I removed the scripts from the query and it's working. I'm going to leave this open because I feel like this should be possible using the SDK since the query with scripting works in the console without issue.

1

1 Answers

0
votes

This same thing also threw me through a loop but I managed to figure out what was wrong. The problem is with the

parent = transfer_client.common_project_path(project)

line that is given in the example query. By default, this returns something of the form projects/{project_id}. However, the CreateTransferConfigRequest documentation says of the parent parameter:

The BigQuery project id where the transfer configuration should be created. Must be in the format projects/{project_id}/locations/{location_id} or projects/{project_id}. If specified location and location of the destination bigquery dataset do not match - the request will fail.

Sure enough, if you use the projects/{project_id}/locations/{location_id} format instead, it resolves the error and allows you to pass a null destination_dataset_id.