I am trying to tidy up a large (8gb) .csv file in python then stream it into BigQuery. My code below starts off okay, as the table is created and the first 1000 rows go in, but then I get the error:
InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
Is this perhaps related to the streaming buffer? My issue is that I will need to remove the table before i run the code again, otherwise the first 1000 entries will be duplicated due to the 'append' method.
import pandas as pd
destination_table = 'product_data.FS_orders'
project_id = '##'
pkey ='##'
chunks = []
for chunk in pd.read_csv('Historic_orders.csv',chunksize=1000, encoding='windows-1252', names=['Orderdate','Weborderno','Productcode','Quantitysold','Paymentmethod','ProductGender','DeviceType','Brand','ProductDescription','OrderType','ProductCategory','UnitpriceGBP' 'Webtype1','CostPrice','Webtype2','Webtype3','Variant','Orderlinetax']):
chunk.replace(r' *!','Null', regex=True)
chunk.to_gbq(destination_table, project_id, if_exists='append', private_key=pkey)
chunks.append(chunk)
df = pd.concat(chunks, axis=0)
print(df.head(5))
pd.to_csv('Historic_orders_cleaned.csv')