I'm looking into Google Cloud, it is very appealing, specially for data intensive applications. I'm looking into Pub/Sub + Dataflow and I'm trying to figure out the best way to replay events that were send via Pub/Sub in case the processing logic changes.
As far as I can tell, Pub/Sub retention has an upper bound of 7 days and it is per subscription, the topic itself does not retain data. In my mind, it would allow to disable the log compaction, like in Kafka, so I can replay data from the very beginning.
Now, since dataflow promises that you can run the same jobs in batch and streaming mode, how effective would it be to simulate this desired behavior by dumping all events into Google Storage and replying from there?
I'm also open for any other ideas.
Thank you