1
votes

I have a panda dataframe with a column with date format as below:

PublishDate= 2018-08-31 I used panda to_gbq() function to dump data into a bigquery table. Before dumping data, I make sure that the format of columns match with table scheme. publishedate is date only in bigquery table. How can achieve something similar to:

     df['PublishDate'] = df['PublishDate'].astype('?????')

I tried datetime64[D] and

     df['PublishDate'] = pd.to_datetime(df['PublishDate'], format='%Y-%m-%d', errors='coerce').dt.date
     df['PublishDate'] = [time.to_date() for time in df['PublishDate']]

but those didn't work!

4

4 Answers

6
votes

I was facing the same issue

Found that according to the documentation you can provide

table_schema : list of dicts, optional

So in my case adding

table_schema = [{'name':'execution_date','type': 'DATE'}]

worked

Full Line:

 pdg.to_gbq(table_for_uploading, upload_table, project_id=project_id, if_exists='replace', credentials=gbq_credentials,table_schema = [{'name':'execution_date','type': 'DATE'}])
2
votes

Afaik, pandas-gbq doesn't seem to have support for the DATE type. So your best option is probably exporting your column as a TIMESTAMP and then converting it to a DATE with an SQL query.

df['PublishTimestamp'] = pd.to_datetime(
    df['PublishDate'],
    format='%Y-%m-%d',
    errors='coerce'
)
df.to_gbq("YOUR-DATASET.YOUR-TABLE", project_id="YOUR-PROJECT")

client = bigquery.Client()

job_config = bigquery.QueryJobConfig()
table_ref = client.dataset("YOUR-DATASET").table("YOUR-TABLE")
job_config.destination = ref_table
job_config.write_disposition = "WRITE_TRUNCATE"

sql = """
    SELECT
      *,
      DATE(PublishTimestamp) as PublishDate
    FROM
      `YOUR-PROJECT.YOUR-DATASET.YOUR-TABLE`
"""

query_job = client.query(
    sql,
    job_config=job_config
)
query_job.result()
0
votes

I couldn't find support for Date type in pandas-gbq.

Another option is to insert with the bigquery client:

from google.cloud import bigquery


def chunks(l, chunk_size):
    for i in range(0, len(l), chunk_size):
        yield l[i:i + chunk_size]


CLIENT_ROW_LIMIT = 10000
SCHEMA = [
    bigquery.SchemaField('...'),
]

def push_with_date(df):
    client = bigquery.Client(project='...')
    dataset = client.dataset('...')
    table_ref = dataset.table('...')
    rows = [row.tolist() for index, row in df.iterrows()]
    for i, chunk in enumerate(chunks(rows, CLIENT_ROW_LIMIT)):
        print('pushing', i)
        errors = client.insert_rows(table_ref, chunk, SCHEMA)
        if errors:
            # Handle
            raise Exception
0
votes

try this one. this works just as a workaround. without to_gbq.

job_config = bigquery.LoadJobConfig(
    schema=table_schema, source_format=bigquery.SourceFormat.CSV
)
load_job = bigquery_client.load_table_from_dataframe(
    dataframe, table_id, job_config=job_config
)