0
votes

I'm trying to merge two databases with the same schema on Google BigQuery.

I'm following the merge samples here: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement

However, my tables have nested columns, ie "service.id" or "service.description"

My code so far is:

MERGE combined_table
USING table1
ON table1.id = combined_table.id
WHEN NOT MATCHED THEN
  INSERT(id, service.id, service.description)
  VALUES(id, service.id, service.description)

However, I get the error message: Syntax error: Expected ")" or "," but got ".", and a red squiggly underline under .id on the INSERT(...) line.

Here is a view of part of my table's schema:

[
    {
        "name": "id",
        "type": "STRING"
    },
    {
        "name": "service",
        "type": "RECORD",
        "fields": [
            {
                "name": "id",
                "type": "STRING"
            },
            {
                "name": "description",
                "type": "STRING"
            }
        ]
    },
    {
        "name": "cost",
        "type": "FLOAT"
    }
...
]

How do I properly structure this INSERT(...) statement so that I can include the nested columns?

2

2 Answers

2
votes

Syntax error: Expected ")" or "," but got "."

Looks like you are on the right direction, Note in the documentation how you need to insert value to a REPEATED column,

You need to define the structure to guide BigQuery what to expect, For example:

STRUCT<created DATE, comment STRING>

This is the full example from the documentation

MERGE dataset.DetailedInventory T
USING dataset.Inventory S
ON T.product = S.product
WHEN NOT MATCHED AND quantity < 20 THEN
  INSERT(product, quantity, supply_constrained, comments)
   -- insert values like this
   VALUES(product, quantity, true, ARRAY<STRUCT<created DATE, comment STRING>>[(DATE('2016-01-01'), 'comment1')])

WHEN NOT MATCHED THEN
  INSERT(product, quantity, supply_constrained)
  VALUES(product, quantity, false)
1
votes

I've found the answer.

It turns out when referencing the top level of a STRUCT, BigQuery references all of the nested columns as well. So if I wanted to INSERT service and all of it's sub-columns (service.id and service.description), I only have to include service in the INSERT(...) statement.

The following code worked:

...
WHEN NOT MATCHED THEN
  INSERT(id, service)
  VALUES(id, service)

This would merge all sub columns, including service.id and service.description.