I'm using Google Cloud Platform to transfer data from an Azure server to a BigQuery table (working nice and smoothly, functionally speaking). The pipeline looks like this: Dataflow streaming pipeline
The 'FetchMetadata' part of the pipeline is a simple TextIO.Read implementation where I read a 66-line .csv file with metadata from a GCP Storage bucket:
PCollection<String> metaLine = p.apply(TextIO.Read.named("FetchMetadata")
.from("gs://my-bucket"));
When I use my pipeline in Batch mode this works like a charm: first the metadata file is loaded in the pipeline in less than a second of vCPU time and then the data itself is loaded in the pipeline. Now when running in Streaming mode I would love to replicate that behaviour to some extent but when I just use the same code there is a problem: when running the pipeline for just 15 minutes (actual time) the TextIO.Read block uses a whopping 4 hours of vCPU time. For a pipeline that will be permanently running for a low budget project this is unacceptable.
So my question: is it possible to change the code so the file is periodically read again (if the file changes I want the pipeline to be updated, so let's say hourly updates) and not continiously like it's doing right now.
I've found some documentation where there is mention of TextIO.Read.Bound which looks like a good place to start solving this issue, but it's no solution for my periodical update problem (as far as I know)