2
votes

In my python project, I need to fill a bigquery table with a relational dataframe. I'm having a lot of trouble at creating a new table from scratch and being sure that the first data I upload to it are actually put into the table.

I've read the page https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency and have seen that applying a insertId to the insert query would solve the problem, but since I use pandas's dataframes, the function to_gbq of the pandas-gbq package seems to be perfect for this task. Yet, when using to_gbq function and a new table is created/replaced, sometimes (apparently randomly) the first data chunk is not written into the table.

Does anybody know how to ensure the complete insertion of a DataFrame into a bigquery new created table? Thanks

1

1 Answers

2
votes

I believe you are encountering https://github.com/pydata/pandas-gbq/issues/75. Basically, Pandas using the BigQuery streaming API to write data into tables, but the streaming API has a delay after table creation to when it starts working.

Edit: Version 0.3.0 of pandas-gbq fixes this issue by using a load job to upload data frames to BigQuery instead of streaming.

In the meantime, I'd recommend using a "load job" to create the tables. For example, using the client.load_table_from_file method in the google-cloud-bigquery package.

from google.cloud.bigquery import LoadJobConfig
from six import StringIO

destination_table = client.dataset(dataset_id).table(table_id)
job_config = LoadJobConfig()
job_config.write_disposition = 'WRITE_APPEND'
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
rows = []

for row in maybe_a_dataframe:
    row_json = row.to_json(force_ascii=False, date_unit='s', date_format='iso')
    rows.append(row_json)

body = StringIO('{}\n'.format('\n'.join(rows)))

client.load_table_from_file(
    body,
    destination_table,
    job_config=job_config).result()

Edit: This code sample fails for columns containing non-ASCII characters. See https://github.com/pydata/pandas-gbq/pull/108