0
votes

I have a DML instruction made to run in Bigquery, every hour, similar to this one:

MERGE dataset.DetailedInventory T
    USING dataset.Inventory S
    ON T.product = S.product
    WHEN NOT MATCHED AND quantity < 20 THEN
      INSERT(product, quantity, supply_constrained, comments)
      VALUES(product, quantity, true, ARRAY<STRUCT<created DATE, comment STRING>>[(DATE('2016-01-01'), 'comment1')])
    WHEN NOT MATCHED THEN
      INSERT(product, quantity, supply_constrained)
      VALUES(product, quantity, false)

This instruction that we are trying to execute does not need any transformations, but others in the future will.

And because of that, we are thinking about using DataFlow with Apache Beam.

I am trying to execute this MERGE statement inside an Apache Beam pipeline, something similar to this:

pipeline.apply(
                "DEPARTMENT - RUN MERGE WITH INSERT AND UPDATE STATEMENTS",
                BigQueryIO.readTableRows().fromQuery(MERGE_DEPARTMENT_I_U).usingStandardSql());

The problem is that when we put this code to run on DataFlow it doesn't work.

I was wondering if this would be the best solution for executing Bigquery MERGE statements using Java.

And I wanted your opinion to know if it is possible to use instructions like that inside an Apache Beam pipeline.

Thank you guys!

1

1 Answers

1
votes

Looks like you are trying to do CDC - You can take a look at one of the new the Dataflow templates which does MERGE for CDC operations here