We are using the following code to write the records to BigQuery:
BigQueryIO.writeTableRows()
.to("table")
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withSchema(schema);
With this code, when we do a backfill, some of the records get sent to this dataflow again, resulting in duplicates in the BigQuery table. Is there any way to configure an upsert operation based on the field name in the dataflow?
WRITE_TRUNCATEwould not help. - Darshan Mehta