1
votes

I want to upload csv data into BigQuery. When the data has different types (like string and int), it is capable of inferring the column names with the headers, because the headers are all strings, whereas the other lines contains integers.

BigQuery infers headers by comparing the first row of the file with other rows in the data set. If the first line contains only strings, and the other lines do not, BigQuery assumes that the first row is a header row.

https://cloud.google.com/bigquery/docs/schema-detect

The problem is when your data is all strings ...

You can specify --skip_leading_rows, but BigQuery still does not use the first row as the name of your variables.

I know I can specify the column names manually, but I would prefer not doing that, as I have a lot of tables. Is there another solution ?

2

2 Answers

4
votes

If your data is all in "string" type and if you have the first row of your CSV file containing the metadata, then I guess it is easy to do a quick script that would parse the first line of your CSV and generates a similar "create table" command:

bq mk --schema name:STRING,street:STRING,city:STRING... -t mydataset.myNewTable

Use that command to create a new (void) table, and then load your CSV file into that new table (using --skip_leading_rows as you mentioned)

14/02/2018: Update thanks to Felipe's comment:

Above comment can be simplified this way:

bq mk --schema `head -1 myData.csv` -t mydataset.myNewTable
1
votes

It's not possible with current API. You can file a feature request in the public BigQuery tracker https://issuetracker.google.com/issues/new?component=187149&template=0.

As a workaround, you can add a single non-string value at the end of the second line in your file, and then set the allowJaggedRows option in the Load configuration. Downside is you'll get an extra column in your table. If having an extra column is not acceptable, you can use query instead of load, and select * EXCEPT the added extra column, but query is not free.