I am using the following Python GCP cloud function to load a csv file from the GCS bucket to a BigQuery table.
def csv_in_gcs_to_table(bucket_name: str, object_name: str, dataset_id: str,
table_id: str,
schema: List[bigquery.SchemaField]) -> None:
"""Upload CSV to BigQuery table.
If the table already exists, it overwrites the table data.
Args:
bucket_name: Bucket name for holding the object
object_name: Name of object to be uploaded
dataset_id: Dataset id where the table is located.
table_id: String holding id of hte table.
schema: Schema of the table_id
"""
client = bigquery.Client()
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = schema
job_config.source_format = bigquery.SourceFormat.CSV
job_config.write_disposition = bigquery.WriteDisposition().WRITE_TRUNCATE
uri = "gs://{}/{}".format(bucket_name, object_name)
load_job = client.load_table_from_uri(uri,
dataset_ref.table(table_id),
job_config=job_config)
load_job.result()
The function is triggered every time a new file lands in the bucket, and pick the file that correspond to the object_name argument.
I would like the load function to pick the file that was uploaded last to the bucket, in other words the file that triggered the event.
My question is how it can be achieved.
object_name
is supposed to identify the file that triggered the Cloud Function already, which seems to be what you want. Are you saying thatobject_name
refers to another file than the one that triggered the upload for you? – Frank van Puffelenevent["name"]
as shown here: cloud.google.com/functions/docs/tutorials/… – Frank van Puffelen