Full workflow:
- SFTP Mirror uploads new files from SFTP to GCS Bucket
- New GCS Objects trigger Cloud Function
- Cloud Function triggers a Composer/Airflow DAG and sends it the path of new GCS object
Looking at DAG run history in Composer/Airflow UI, where there's a task failure and then immediately following by task success.
The purpose of the task is to upload a file to BQ. The path to file is provided by the Cloud Function.
There is a clear pattern where the logs of the failed task show that the task tried to process a file with pattern like my_timestamped_file_name.csv.part
The following task that succeeds show in the logs that the file it processed had the same pattern without the .part
: my_timestamped_file_name.csv
It seems to me that the Cloud Function (CF) is being triggered by the partially uploaded file created by SFTP mirror instead of waiting for the file to be done uploading. Of course, when the file is completely uploaded, the .part
file disappears and the task fails because it has nothing to process.
My Cloud Function's Event Type is defined as Finalize/Create. Is there a way to avoid partially uploaded files? Other than using a hacky conditional statement inside the CF to avoid files that end with .part
?
*.part
files appearance, transferring the files from some other source location, disregarding SFTP connection? - Nick_Kh*.part
files exist, however temporary, seems to be true because of the logs. I don't see how something would cause a printout of a non-existent filename - Korean_Of_the_Mountain