I would like to upload some data that is currently stored in postGreSQL to Google Bigquery to see how the two tools compare.
To move data around there are many options but the most user friendly (for me) one I found thus far leverages the power of python pandas.
sql = "SELECT * FROM {}".format(input_table_name)
i = 0
for chunk in pd.read_sql_query(sql , engine, chunksize=10000):
print("Chunk number: ",i)
i += 1
df.to_gbq(destination_table="my_new_dataset.test_pandas",
project_id = "aqueduct30",
if_exists= "append" )
however this approach is rather slow and I was wondering what options I have to speed things up. My table has 11 million rows and 100 columns.
The postGreSQL is on AWS RDS and I call python from an Amazon EC2 instance. Both are large and fast. I am currently not using multiple processors although there are 16 available.
bqcommand line tools. Because with golang program it took several hours. Script covers also conversion of table structure into data types used on BQ - postgresql.freeideas.cz/… - JosMac