I'm trying to build a streaming/batch pipeline which read events from Pub/Sub and write them into BigQuery using python3.6
According to the documentation, Cloud Pub/Sub assigns a unique message_id
and timestamp
to each message, which can be used to detect duplicate messages received by the subscriber. (https://cloud.google.com/pubsub/docs/faq)
Requirement is as below: 1) Messages can come in any order(asynchronous messaging) 2) Unique Id and timestamp(when the record was pushed to pubsub topic / pulled from topic) should be captured to the existing record.
Input Data :
Name Age
xyz 21
Output Record to bigquery:
Name Age Unique_Id Record_Timestamp(time when it was written to topic or pulled from topic)
xyz 21 hdshdfd_12 2019-10-16 12:06:54
Can anyone provide me some link on how to handle this or inputs if we can execute this through pubsub