long-running job on GCP cloud run

Question

I am reading 10 million records from BigQuery and doing some transformation and creating the .csv file, the same .csv stream data I am uploading to SFTP server using Node.JS.

This job taking approximately 5 to 6 hrs to complete the request locally.

Solution has been delpoyed on GCP Cloud run but after 2 to 3 second cloud run is closing the container with 503 error.

Please find below configuration of GCP Cloud Run.

Autoscaling: Up to 1 container instances CPU allocated: default Memory allocated: 2Gi Concurrency: 10 Request timeout: 900 seconds

Is GCP Cloud Run is good option for long running background process?

You're using the wrong tool. Cloud Run is not a good fit for this. Try Cloud Dataflow instead. — Graham Polley
Is it possible to upload file in Cloud Dataflow steps? @graham-polley — mayur nimavat
Upload the file first to Cloud Storage. Cloud Dataflow reads files from Cloud Storage. — John Hanley
@guillaumeblaquiere, Yes I want to keep container idle for long period time for processing request in background. — mayur nimavat

Kaustubh Ghole Kaustubh Ghole · Accepted Answer · 2020-01-08T13:40:54

You can try using an Apache Beam pipeline deployed via Cloud Dataflow. Using Python, you can perform the task with the following steps:

Stage 1. Read the data from BigQuery table.

beam.io.Read(beam.io.BigQuerySource(query=your_query,use_standard_sql=True))

Stage 2. Upload Stage 1 result into a CSV file on a GCS bucket.

beam.io.WriteToText(file_path_prefix="", \
                    file_name_suffix='.csv', \
                    header='list of csv file headers')

Stage 3. Call a ParDo function which will then take CSV file created in Stage 2 and upload it to the SFTP server. You can refer this link.

long-running job on GCP cloud run

5 Answers