I am running a python script which unloads a table called newdataset.newtable2 from Bigquery to the google storage bucket of my app.
Here is my code:
scope = ["https://www.googleapis.com/auth/bigquery"]
project_id='txxxxxxx9'
dataset_id = 'newdataset'
table_id = 'newtable2'
with open('/home/xxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxx.json') as auth_file:
key = json.load(auth_file)
client_email = key['client_email']
pv_key = key['private_key']
credentials = SignedJwtAssertionCredentials(client_email, pv_key, scope=scope)
bigquery_service = build('bigquery', 'v2', credentials=credentials)
job_data = {
'jobReference': {
'projectId': project_id,
'jobId': str(uuid.uuid4())
},
'configuration': {
'extract': {
'sourceTable': {
'projectId': project_id,
'datasetId': dataset_id,
'tableId': table_id,
},
'destinationUris': ['gs://xxxxxxx/test.csv'],
'destinationFormat': 'CSV'
}
}
}
query_job = bigquery_service.jobs().insert(projectId=project_id, body=job_data).execute()
I am astonished by the slowness of the request. My table is 300Mb and the request lasts for 5 mins. Note that this request does not appear in the job part of my BigQuery UI, but after 5 mins, the .csv can be found in my bucket and looks good.
In Redshift and S3, such a request would last 5 sec no more. My question: Am I doing the right thing? Or am I missing something?
If my code is good, can someone tell me why this basic task takes so much time?
Note : I am using a free account for now (not upgraded)