BigQuery load parquet error - incompatible types for field INT32 in Parquet vs. double in schema

Question

I am trying to load a list of parquet files into a BigQuery table, but I am getting an error:

bq --location=EU load --source_format=PARQUET project:Input.k_2017_11_new "gs://my_bucket/2017_11/11/*.parquet"

Waiting on bqjob_r557b5eb5986df8a0_0000016855915d09_1 ... (34s) Current status: DONE

BigQuery error in load operation: Error processing job 'project:bqjob_r557b5eb5986df8a0_0000016855915d09_1': Error while reading data, error message: incompatible types for field 'data.list.element.p': INT32 in Parquet vs. double in schema

I actually do not need the field that is causing the error, but cannot find a way to skip this column.

Is there a solution to this problem?

I have tried specifying the schema with a json file, and forcing this field to FLOAT, or INT64, STRING, but nothing works so far.

MisterJT MisterJT · Accepted Answer · 2019-05-23T20:39:16

I see you're using cloudShell to load from parquet to BigQuery. Try writing a schema file in JSON, copying or uploading it into your cloudShell instance, and calling the file after you give the SOURCE-TO-PATH parameter:

bq --location=EU load --source_format=PARQUET project:Input.k_2017_11_new "gs://my_bucket/2017_11/11/*.parquet" ./mySchema.json

BigQuery load parquet error - incompatible types for field INT32 in Parquet vs. double in schema

2 Answers