1
votes

I'm looking into Google Cloud, it is very appealing, specially for data intensive applications. I'm looking into Pub/Sub + Dataflow and I'm trying to figure out the best way to replay events that were send via Pub/Sub in case the processing logic changes.

As far as I can tell, Pub/Sub retention has an upper bound of 7 days and it is per subscription, the topic itself does not retain data. In my mind, it would allow to disable the log compaction, like in Kafka, so I can replay data from the very beginning.

Now, since dataflow promises that you can run the same jobs in batch and streaming mode, how effective would it be to simulate this desired behavior by dumping all events into Google Storage and replying from there?

I'm also open for any other ideas.

Thank you

2

2 Answers

4
votes

As you said, Cloud Pub/Sub does not currently support replays, so you need to save events somewhere to replay later and Cloud Storage sounds like a good place to do that.

3
votes

Cloud Pub/Sub now has the ability to replay previously acknowledged messages. Please see the quickstart and related blog post for information on how to use the feature.