0
votes

I'm new to GCP, I'm trying to build an ETL stream that will upload data from files to BigQuery. It seems to me that the best solution would be to use gsutil. The steps I see today are:

  1. (done) Downloading the .zip file from the SFTP server to the virtual machine
  2. (done) Unpacking the file
  3. Uploading files from VM to Cloud Storage
  4. (done) Automatically upload files from Cloud Storage to BigQuery

Steps 1 and 2 would be performed according to the schedule, but I would like step 3 to be event driven. So when files are copied to a specific folder, gsutil will send them to the specified bucket in Cloud Storage. Any ideas how can this be done?

1
If steps 1 and 2 are on a schedule, why do you need step 3 to be event driven? How are you performing step 4?Ben P
I'm not sure how long it will take for me to copy files (sometimes it can be 1 GB and sometimes 10 GB). Step # 4 is executed using a script in Cloud Functionskovalski1601

1 Answers

1
votes

Assuming you're running on a Linux VM, you might want to check out inotifywait, as mentioned in this question -- you can run this as a background process to try it out, e.g. bash /path/to/my/inotify/script.sh &, and then set it up as a daemon once you've tested it out and got something working to your liking.