0
votes

I want to upload a panda dataframe to Big Query using the Dataframe.to_gbq() function.

I specify a table_schema argument to force a specific column order in BigQuery (that might differ from Dataframe).

So I use for example :

table_schema = [{'name': 'col1', 'type': 'INT64'}, 
{'name': 'col2', 'type': 'STRING'}, 
{'name': 'col3', 'type': 'STRING'}, 
{'name': 'col4', 'type': 'STRING'}, 
{'name': 'col5', 'type': 'STRING'}, 
{'name': 'col6', 'type': 'FLOAT64'}, 
{'name': 'col7', 'type': 'INT64'}, 
{'name': 'col8', 'type': 'FLOAT64'}]

Dataframe.to_gbq(destination_table, if_exists='replace', table_schema=table_schema)

Colum order in Dataframe is : Col1, Col3,Col4, Col5, Col2, Col6, Col7,Col8

Job is done correctly.

But then when I check table schema of the created (or replaced) destination_table in Big Query, column order is : Col1, Col3,Col4, Col5, Col2, Col6, Col7,Col8

(order of the dataframe and not that of the table_schema)

Shouldn't the order specified in the table schema be respected ?

If not, is there a way to force that ?

1
Someone answered your question hereKa Boom
@KaBoom Column ordering is different from row ordering.Oluwafemi Sule

1 Answers

1
votes

Reorder the columns of the dataframe by indexing it in the order you want

ordered_columns = [c['name'] for c in table_schema]

Dataframe[ordered_columns].to_gbq(destination_table, if_exists='replace', table_schema=table_schema)