4
votes

I have problems creating a simple table using Avro as import format. I would like to create a typical table with N columns of primitive data types, for example a table with columns c1 STRING and c2 INTEGER.

I use schema like this

{
    "type": "record",
    "name": "x",
    "fields": [
        {
            "name": "c1",
            "type": "string"
        },
        {
            "name": "c2",
            "type": "int"
        }
    ]
}

This results into one column with name root of type RECORD having nested c1 and c2.

How can import to a table that has two columns c1 and c2 with Avro? Or is it currently possible to only create tables with one column with fixed name root?

I tried other approaches:

  • Avro type array (of string) creates one columns with name root of type STRING REPEATED
  • Simplest schema like {"type":"string"} creates one column with name root of type STRING. This approach doesn't let create more than one column nor change the name of the column.
1

1 Answers

0
votes

Thanks for the feedback. Avro schema always starts with one unnamed column, which can be a primitive type like string, or a complex type like record. We now map it to a column in BQ and fix the name to be "root". We'll consider striping the enclosing record at root level if this becomes a common request. But then we're processing the record type differently than other types.

The solution now is to strip the root record with a query and save to another table. Or, you can combine the load and strip in one step through federated tables:

bq query --external_table_definition=foo::AVRO=gs://your_bucket/path/file.avro* --destination_table your_dataset.your_table "SELECT root.* FROM foo"