I have an application writing data to Google Cloud pubsub and as per the documentation of pubsub, duplicates due to retry mechanism is something that can happen once in a while. There is also the issue of out-of-order messages which is also not guaranteed in pubsub.
Also per documentation, it is possible to use Google Cloud Dataflow to deduplicate these messages.
I want to make those messages available in a messaging queue (meaning cloud pubsub) for services to consume and cloud Dataflow does seem to have a pubsubio writer however wouldn't you be getting back to the exactly the same problem where writing to pubsub can create duplicates? Wouldn't that also be the same issue with order? How can I stream messages in order using pubsub (or any other system for that matter)?
Is it possible to use cloud dataflow to read from a pubsub topic and write to another pubsub with guarantees of no duplicates? If not how else would you do this that supports streaming for a relatively small amount of data?
Also I am very new to Apache beam/Cloud Dataflow. How would such a simple use case look like? I suppose I can deduplicate using the ID generated by pubsub itself, as I am letting the pubsub library do its internal retry rather than do it myself so the ID should be the same on retries.