0
votes

I have a beam pipeline that reads from a Kafka source (AppendLogTime is not available), timestamp is embedded in data objects in Kafka stream.

I want to use event-time for my pipeline, after googling a bit I see some solutions make use of CustomeFieldTimePolicy (extract timestamp of data objects and use it for setting watermark) when reading data from KafkaIO.

But then I see another solution, which is using WithTimestamps.of() to assign timestamp to elements.

My question is, what are the different between those 2 methods? Cause to me it seems they do very similar job.

Thank you.

1

1 Answers

1
votes

You want to use the CustomFieldTimePolicy. This will adjust the timestamps used to calculate the watermark of the Kafka source.

WithTimestamps.of() is used to alter the timestamps of elements in a pipeline, but does not affect watermarks. It is a very simple transform based on ParDo. It is limited in what transformations it can do, because it is forbidden to contradict the watermark.