Rsync is not an operation that can't be performed via a single request in the storage rest API, and gsutil
is not available on Cloud Functions, for this reason rsync both buckets via a python script is not possible.
You can create a function to start a preemptible VM with a startup script that executes the rsync between buckets and shut down the instance after finalizing the rsync operation.
By using a VM instead of a serverless service you can avoid any timeout that could be generated by a long rsync process.
A preemptible VM can run for up to 24Hours before been stopped and you only will charged by the time that the instance is turned on (the disk storage will be charged independently of the status)
If the VM is powered off before a minute you won't be charged by the usage.
For this approach first is necessary to create a bash script in a bucket, this will be executed by the preemptible VM at the startup time for example:
#! /bin/bash
gstuil rsync -r gs://mybucket1 gs://mybucket2
sudo init 0 #this is similar to poweroff, halt or shutdown -h now
After that, you need to create a preemptible VM with a Startup script, I recommend an f1-micro
instance since the rsync command between buckets doesn't require so much resources.
1.- go to the VM Instances page.
2.- Click Create instance.
3.- On the Create a new instance page, fill in the properties for your instance.
4.- Click Management, security, disks, networking, sole tenancy.
5.In the Identity and API access section, select a service account that has access to read your startup script file in Cloud Storage and the buckets to be synced
- Select Allow full access to all Cloud APIs.
7.- Under Availability policy, set the Preemptibility option to On. This setting disables automatic restart for the instance, and sets the host maintenance action to Terminate.
8.- In the Metadata section, provide startup-script-url as the metadata key.
9.- In the Value box, provide a URL to the startup script file, either in the gs://BUCKET/FILE or https://storage.googleapis.com/BUCKET/FILE format.
10.Click Create to create the instance.
With this configuration every time that your instance will be started the script also will be executed.
This is the python function to start a VM (independently if this is preemptible)
def power(request):
import logging
# this libraries are mandatory to reach compute engine api
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
# the function will take the service account of your function
credentials = GoogleCredentials.get_application_default()
# this line is to specify the api that we gonna use, in this case compute engine
service = discovery.build('compute', 'v1', credentials=credentials, cache_discovery=False)
# set correct log level (to avoid noise in the logs)
logging.getLogger('googleapiclient.discovery_cache').setLevel(logging.ERROR)
# Project ID for this request.
project = "yourprojectID" # Update placeholder value.
zone = "us-central1-a" # update this to the zone of your vm
instance = "myvm" # update with the name of your vm
response = service.instances().start(project=project, zone=zone, instance=instance).execute()
print(response)
return ("OK")
requirements.txt file
google-api-python-client
oauth2client
flask
And you can schedule your function by Cloud Scheduler:
- Create a service account with functions.invoker permission within your function
- Create new Cloud scheduler job
- Specify the frequency in cron format.
- Specify HTTP as the target type.
- Add the URL of your cloud function and method as always.
- Select the token OIDC from the Auth header dropdown
- Add the service account email in the Service account text box.
- In
audience field
you must only need to write the URL of the function without any additional parameter
On cloud scheduler, I hit my function by using these URL
https://us-central1-yourprojectID.cloudfunctions.net/power
and I used this audience
https://us-central1-yourprojectID.cloudfunctions.net/power
please replace yourprojectID
in the code and in the URLs and the zone us-central1