0
votes

I'm trying to make log data flow with Google Cloud PubSub and Fluentd subscribers. The architecture is that, first, a group of web servers send their access log to the same PubSub endpoint, and second, Fluentd servers pull logs from the PubSub endpoint and send them into Google BigQuery and other sub systems.

My question is how to keep the message idempotent in such a architecture. According to the document of Google Cloud Pubsub, subscribers should be responsible for keeping the log idempotent. https://cloud.google.com/pubsub/docs/subscriber#delivery-contract

But I'm not sure the good (and if possible, simple) way to do it with many fluentd servers. Do you have any good idea?

1

1 Answers

1
votes

Setting big number to ackDeadlineSeconds seems enough to avoid from logs duplicated in common case. https://cloud.google.com/pubsub/docs/reference/rest/v1/projects.subscriptions/create