Cloud Dataprep BigQuery Upsert

Question

Is there a way to update rows in Google BigQuery when publishing from Cloud Dataprep? I can't find anything in the documentation.

I have a dataset I'm preprocessing with Dataprep that contains new rows and updated rows on every (daily) run. I would like to incrementally write it to BigQuery adding new rows and updating existing changed rows.

Amit Miller Amit Miller · Accepted Answer · 2020-04-14T12:10:18

So Trifacta (CDP) can either append, replace or create new file each time the Job is run. You're refering to an Upsert\Merge action. So this can be done by 2 steps.

for new rows, you can use the "append" run option
for updating existing rows - 2 options.

Either - A. do a Union transformation with the output file (already in bigquery), then deduplicate\aggregate and choose how you'd like to merge between the duped rows. this then can be done each time a new file is made\created\updated. B. create a new file, then through parameterization import all files within the folder - unioning them and do the grouping mentioned in A.

Obviously - option A seems more simplistic.

Does that make sense?

Cloud Dataprep BigQuery Upsert

1 Answers