I'll preface this by saying I'm fairly new to BigQuery. I'm running into an issue when trying to schedule a query using the Python SDK. I used the example on the documentation page and modified it a bit but I'm running into errors.
Note that my query does use scripting to set some variables, and it's using a MERGE statement to update one of my tables. I'm not sure if that makes a huge difference.
def create_scheduled_query(dataset_id, project, name, schedule, service_account, query):
parent = transfer_client.common_project_path(project)
transfer_config = bigquery_datatransfer.TransferConfig(
destination_dataset_id=dataset_id,
display_name=name,
data_source_id="scheduled_query",
params={
"query": query
},
schedule=schedule,
)
transfer_config = transfer_client.create_transfer_config(
bigquery_datatransfer.CreateTransferConfigRequest(
parent=parent,
transfer_config=transfer_config,
service_account_name=service_account,
)
)
print("Created scheduled query '{}'".format(transfer_config.name))
I was able to successfully create a query with the function above. However the query errors out with the following message:
Error code 9 : Dataset specified in the query ('') is not consistent with Destination dataset '{my_dataset_name}'.
I've tried changing passing in "" as the dataset_id parameter, but I get the following error from the Python SDK:
google.api_core.exceptions.InvalidArgument: 400 Cannot create a transfer with parent projects/{my_project_name} without location info when destination dataset is not specified.
Interestingly enough I was able to successfully create this scheduled query in the GUI; the same query executed without issue.
I saw that the GUI showed the scheduled query's "Resource name" referenced a transferConfig, so I used the following command to see what that transferConfig looked like, to see if I could apply the same parameters using my Python script:
bq show --format=prettyjson --transfer_config {my_transfer_config}
Which gave me the following output:
{
"dataSourceId": "scheduled_query",
"datasetRegion": "us",
"destinationDatasetId": "",
"displayName": "test_scheduled_query",
"emailPreferences": {},
"name": "{REDACTED_TRANSFER_CONFIG_ID}",
"nextRunTime": "2021-06-18T00:35:00Z",
"params": {
"query": ....
So it looks like the GUI was able to use "" for destinationDataSetId
but for whatever reason the Python SDK won't let me use that value.
Any help would be appreciated, since I prefer to avoid the GUI whenever possible.
UPDATE:
This does appear to be related to the scripting I used in my query. I removed the scripts from the query and it's working. I'm going to leave this open because I feel like this should be possible using the SDK since the query with scripting works in the console without issue.