I have a cloud function that starts a Compute Engine instance. However, when multiple functions are triggered, the previous actions running on the Compute Engine are interrupted by the incoming functions/commands. The Compute Engine is running a pytorch implementation... Would there be a way to send those incoming functions to a queue, so that the current action running on Compute Engine completes before picking up the next incoming action before shutting the machine down? Any conceptual guidance would be greatly appreciated.
EDIT
My function is triggered on changes to a Storage bucket (uploads). In the function, I start a GCE instance and customize its startup behavior with a startup script as follows (some commands and directories are simplified for brevity):
import os
from googleapiclient.discovery import build
def start(event, context):
file = event
print(file["id"])
string = file["id"]
newstring = string.split('/')
userId = newstring[1]
paymentId = newstring[2]
name = newstring[3]
print(name)
if name == "uploadcomplete.txt":
startup_script = """#! /bin/bash
cd ~ && pwd 1>>/var/log/log.out 2>&1
PATH=$PATH://usr/local/cuda 1>>/var/log/log.out 2>&1
cd program_directory 1>>/var/log/log.out 2>&1
source /opt/anaconda3/etc/profile.d/conda.sh 1>/var/log/log.out 2>&1
conda activate env
cd keras-retinanet/ 1>>/var/log/log.out 2>&1
export PYTHONPATH=`pwd` 1>>/var/log/log.out 2>&1
cd tracker 1>>/var/log/log.out 2>&1
python program_name --gcs_input_path gs://input/{userId}/{paymentId} --gcs_output_path gs://output/{userId}/{paymentId} 1>>/var/log/log.out 2>&1
sudo python3 gcs_to_mongo.py {userId} {paymentId} 1>>/var/log/log.out 2>&1
sudo shutdown -P now
""".format(userId=userId, paymentId=paymentId)
service = build('compute', 'v1', cache_discovery=False)
print('VM Instance starting')
project = 'XXXX'
zone = 'us-east1-c'
instance = 'YYYY'
metadata = service.instances().get(project=project, zone=zone, instance=instance)
metares = metadata.execute()
print(metares)
fingerprint = metares["metadata"]["fingerprint"]
print(fingerprint)
bodydata = {"fingerprint": fingerprint,
"items": [{"key": "startup-script", "value": startup_script}]}
print(bodydata)
meta = service.instances().setMetadata(project=project, zone=zone, instance=instance,
body=bodydata)
res = meta.execute()
instanceget = service.instances().get(project=project, zone=zone, instance=instance).execute()
request = service.instances().start(project=project, zone=zone, instance=instance)
response = request.execute()
print('VM Instance started')
print(instanceget)
print("'New Metadata:", instanceget['metadata'])
The problem occurs when multiple batches are uploaded to Cloud Storage. Each new function will restart the GCE instance with a new startup script and begin work on the new data, leaving the previous data unfinished.