1
votes

I am running a python script in Cloud Run on a daily basis with Cloud Scheduler to pull data from BigQuery and upload it to Google Cloud Storage as a CSV file. The Cloud Scheduler setup utilizes an HTTP "Target" with a GET "HTTP method". Also, Cloud Scheduler authenticates the https endpoint using a service account with the "Add OIDC token" option.

When running Cloud Scheduler and Cloud Run with a very small subset of the BigQuery data for a job that takes a few seconds, the "Result" in Cloud Scheduler always shows "Success" and the job completes as intended. However, when running Cloud Scheduler and Cloud Run with the full BigQuery dataset for a job that takes a few minutes, the "Result" in Cloud Scheduler always shows "Failed", even though the CSV file is typically (although not always) uploaded into Google Cloud Storage as intended.

(1) When running Cloud Scheduler and Cloud Run on the full BigQuery dataset, why does the "Result" in Cloud Scheduler always show "Failed", even though the job is typically finishing as intended?

(2) How can I fix Cloud Scheduler and Cloud Run to ensure the job always completes as intended and the "Result" in Cloud Scheduler always shows "Success"?

1
How long does the full query take? Are you hitting runtime limits? cloud.google.com/run/quotas You have not included any code or details on your deployment. stackoverflow.com/help/how-to-askJohn Hanley
The python script on the full dataset takes three or four minutes to run and the CSV file is approximately 250MBcompassloire
Show the Stackdriver logs for one of the failed actions (edit your question with these details).John Hanley
The Stackdriver logs say '@type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"' and 'status: "UNKNOWN"' under the 'jsonPayload' section and 'severity: "ERROR"' under the 'Resource' section. Besides this, there is not much additional detail.compassloire
Show the actual stackdriver entries in your question. Go back to my first comment and include your code and deployment details.John Hanley

1 Answers

1
votes

It's a common mistake with Cloud Scheduler. I rose it many times to Google but it nothing as changed until now...

The GUI (the web console) doesn't allow you to configure anything, especially the timeout. Your Cloud Scheduler fails because it considers that it doesn't receive the answer in time when you scan your full BQ dataset (that can take few minutes)

For solving this, use the command line (gcloud), especially the attempt-deadline parameter. You can have a look to other params: retry, backoff,... The allowed customization is interesting, but not present in the GUI!