0
votes

While loading ORC Files to an existing BQ Table using "bq Load" command, im observing that once the data is loaded, the table schema(the column names of the table) also changes.

The Requirement is to Load ORC Files from GCS to a Staging table in BigQuery. This step im trying to achive using "bq load" command, but it alters the existing schema of the table.

Sample Code :

bq load --replace --source_format=ORC some_dateset.some_table_staging gs://some_bucket/some_table/*

I expect to retain the actual column names in the BigQuery Table even after i load the ORC Files into the table. But as of now the column names of the Table gets changed based on the ORC File schema

1

1 Answers

-1
votes

I did test bq load command that you provided on my data with AVRO and ORC files.

  1. --replace or replace=true
    flag is expected to erase all data, and is doing so.
    So what literally is happening, all table and schema is being erased and replaced with new one from your file.

  2. If you want to append data to the table you need to use
    --noreplace or --replace=false
    This flag works fine only when your ORC file schema is that same that already present in the table.

  3. If your data schema differs from that defined in the table, you need to use schema_update_option:
    --schema_update_option=ALLOW_FIELD_ADDITION

so finally in your user case you just need to use:

bq load --schema_update_option=ALLOW_FIELD_ADDITION --source_format=ORC <project_id>:<schema>.<table> gs://<bucket_name>/<file_name>