Simple gcloud dataflow pipeline:
PubsubIO.readStrings().fromSubscription -> Window -> ParDo -> DatastoreIO.v1().write()
When load is applied to the pubsub topic, the messages are read but not acked:
Jul 25, 2017 4:20:38 PM org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSource$PubsubReader stats INFO: Pubsub projects/my-project/subscriptions/my-subscription has 1000 received messages, 950 current unread messages, 843346 current unread bytes, 970 current in-flight msgs, 28367ms oldest in-flight, 1 current in-flight checkpoints, 2 max in-flight checkpoints, 770B/s recent read, 1000 recent received, 0 recent extended, 0 recent late extended, 50 recent ACKed, 990 recent NACKed, 0 recent expired, 898ms recent message timestamp skew, 9224873061464212ms recent watermark skew, 0 recent late messages, 2017-07-25T23:16:49.437Z last reported watermark
What pipeline step should ack the messages?
- stackdriver dashboard shows that there are some acks but the number of unacked messages stays stable.
- no error messages in the trace indicating that the message processing failed.
- entries show up in the datastore