I'm trying to make log data flow with Google Cloud PubSub and Fluentd subscribers. The architecture is that, first, a group of web servers send their access log to the same PubSub endpoint, and second, Fluentd servers pull logs from the PubSub endpoint and send them into Google BigQuery and other sub systems.
My question is how to keep the message idempotent in such a architecture. According to the document of Google Cloud Pubsub, subscribers should be responsible for keeping the log idempotent. https://cloud.google.com/pubsub/docs/subscriber#delivery-contract
But I'm not sure the good (and if possible, simple) way to do it with many fluentd servers. Do you have any good idea?