1
votes

I am uploading a file that is >10MB from App Engine to Google Cloud Storage via the code below.

gcs.bucket(bucket_name).blob(blob_name=file_path).upload_from_string(data, content_type=content_type)

I am using the GCS Python Client Library and not the built-in App Engine library because I am composing multiple >10MB files into a single file in Cloud Storage when the process is complete.

The code is running in a task and has 10 minutes to get the data and upload the information as a CSV to GCS. The data is retrieved and converted into a CSV formatted string in less than 3 minutes. The code then tries to uploading the data to GCS, Stackdriver logging stops receiving logs and I wait ~10 minutes at which point I receive a flood of logs in Stackdriver up to the point of failure with the failure being:

DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded.

This issue is frustrating because of 2 things.

  1. This error is intermittent.
  2. Once 1 file succeeds they all succeed in seconds.

    1. During initial develop the issue never occurred. Only recently has the issue started to appear and is becoming more frequent.

    2. The first >10MB file always takes minutes to fail or succeed. Fails after 10 minutes, but may take anywhere from 1 to 9 minutes and then succeed. Once a file succeeds all future uploads of >10MB files take ~5-10 seconds.

My theory is that there is some service that App Engine is using to upload the files to Google Cloud Storage that automatically goes to sleep after a certain time of no usage. When the service is asleep it takes a very long time to wake it back up. Once the service is awake it can upload to GCS without any issues, very quickly.

Has anyone else run into this or have ideas on how to solve it?

UPDATE

Full error:

(/base/alloc/tmpfs/dynamic_runtimes/python27g/3b44e98ed7fbb86b/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py:279) Traceback (most recent call last): File "/base/alloc/tmpfs/dynamic_runtimes/python27g/3b44e98ed7fbb86b/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in Handle result = handler(dict(self._environ), self._StartResponse) File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/lib/flask/app.py", line 2309, in __call__ return self.wsgi_app(environ, start_response) File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/lib/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/lib/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/lib/flask/app.py", line 1799, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/worker.py", line 277, in cache_records cache_module.save_records(records=records, report_fields=report.report_fields, report_id=report.report_id, header=header_flag) File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/storage/user/user.py", line 110, in save_records user_entry = User.__generate_user_csv(user=user, report_fields=report_fields) File "/base/data/home/apps/s~pg-gx-n-app-200716/worker:20181030t154529.413639922318911836/storage/user/user.py", line 55, in __generate_user_csv for index, attrib in enumerate(report_fields): DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded.

1

1 Answers

0
votes

So that 'failing after 10 minutes' sounds very similar to an issue that I experienced a while back where sometimes processes on a new instance would just hang until they hit their timeout before dieing:

app engine instance dies instantly, locking up deferred tasks until they hit 10 minute timeout

Can you provide the full traceback? And try filtering by instance id in the logs to see if anything else crashed at the same time.

Some generic quick-fixes to try would be:

  1. implementing warmup-requests https://cloud.google.com/appengine/docs/standard/python/configuring-warmup-requests
  2. bumping up your instance class size https://cloud.google.com/appengine/docs/standard/#instance_classes
  3. Isolate this task to run on a separate microservice so that it doesnt have to compete for resources with the rest of your request handlers https://cloud.google.com/appengine/docs/standard/python/microservices-on-app-engine