I am using pandas_gbq.to_gbq()
to export a DataFrame
to Google BigQuery with col1
which has NULL value.
>>>df
col1 day
apple 2019-03-01
None 2019-03-02
banana 2019-03-02
None 2019-03-03
>>>df.dtypes
col1 object
day datetime64[ns]
dtype: object
Without defining the table schema, I am able to export a table in BigQuery successfully with null value in col1
.
from google.cloud import bigquery
import pandas as pd
import pandas_gbq
pandas_gbq.to_gbq(df
,table_name
,project_id='project-dev'
,chunksize=None
,if_exists='replace'
)
default table schema in BigQuery:
col1 STRING NULLABLE
day TIMESTAMP NULLABLE
However, when I try to define day
as DATE type in BigQuery since I don't want TIMESTAMP type, I encountered the error (I've tried NaN and None; both encountered errors).
table_schema = [{'name':'day', 'type':'DATE'}]
pandas_gbq.to_gbq(df
,table_name
,project_id='project-dev'
,chunksize=None
,if_exists='replace'
,table_schema=table_schema
)
Error messages:
in df ,table_schema=table_schema File "/Users/xxx/anaconda3/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 1224, in to_gbq progress_bar=progress_bar, File "/Users/xxx/anaconda3/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 606, in load_data self.process_http_error(ex) File "/Users/xxx/anaconda3/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 425, in process_http_error raise GenericGBQException("Reason: {0}".format(ex)) pandas_gbq.gbq.GenericGBQException: Reason: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
I've read the documentation of pandas_gbq
but I am still not able to figure out.
https://pandas-gbq.readthedocs.io/en/latest/api.html#pandas_gbq.to_gbq
Would someone be able to point me in the right direction? Thanks.
df['day'].dt.strftime('%Y-%m-%d')
, then I define the table schema as above then it works! – WTK