Let Google BigQuery infer schema from csv string file

Question

I want to upload csv data into BigQuery. When the data has different types (like string and int), it is capable of inferring the column names with the headers, because the headers are all strings, whereas the other lines contains integers.

BigQuery infers headers by comparing the first row of the file with other rows in the data set. If the first line contains only strings, and the other lines do not, BigQuery assumes that the first row is a header row.

https://cloud.google.com/bigquery/docs/schema-detect

The problem is when your data is all strings ...

You can specify --skip_leading_rows, but BigQuery still does not use the first row as the name of your variables.

I know I can specify the column names manually, but I would prefer not doing that, as I have a lot of tables. Is there another solution ?

Sourygna Sourygna · Accepted Answer · 2018-02-13T18:47:57

If your data is all in "string" type and if you have the first row of your CSV file containing the metadata, then I guess it is easy to do a quick script that would parse the first line of your CSV and generates a similar "create table" command:

bq mk --schema name:STRING,street:STRING,city:STRING... -t mydataset.myNewTable

Use that command to create a new (void) table, and then load your CSV file into that new table (using --skip_leading_rows as you mentioned)

14/02/2018: Update thanks to Felipe's comment:

Above comment can be simplified this way:

bq mk --schema `head -1 myData.csv` -t mydataset.myNewTable

Let Google BigQuery infer schema from csv string file

2 Answers