I deployed a flask endpoint to the Google AppEngine. It receives a payload, parses it, and sends the data to two different bigquery tables.
My setup:
I'm using python 3.7
, flask
, the google AppEngine
, and the google.cloud.bigquery Client()
class.
In a nutshell, this is what happens:
- endpoint is called
- endpoint gets payload and parses it
- sends data (1 row) to first bigquery table using client.insert_rows_json(table, [data])
- sends data (1 row) to second bigquery table using client.insert_rows(table, [data])
The error(s):
Reading the appengine log of my service I found this error:
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target return target() File "/env/lib/python3.7/site-packages/google/cloud/_http.py", line 423, in api_request raise exceptions.from_http_response(response) google.api_core.exceptions.Forbidden: 403 GET https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT_NAME/datasets/DATASET_NAME/tables/TABLE_NAME: Exceeded rate limits: too many api requests per user per method for this user_method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
My hypothesis is that two calls to the BQ API exceed the user-method limit, is it plausible?
Since both Client() methods use the same API method, insertAll, which is a streaming method
, I doubt it's a matter of quotas.
Another issue I found, is that the table filled by the first call gets ~2% fewer entries than the table filled by the second call (which is the one for which I noticed the error). What could be the reason?
I already checked some payloads not present in the first table and they are legit so I don't think it's a matter of their format not matching the table schema.
EDIT: solved
I ended up following shollyman's advice to drop get_table and instead pass to insert_rows() or insert_rows_json() a Table object initialized with a TableReference and a schema as a list of SchemaField instances.
Code:
from google.cloud import bigquery
client = bigquery.Client()
old
# get table info -- API call
table_id = '{}.{}.{}'.format(PROJECT_ID, DATASET_ID, TABLE_ID)
table = client.get_table(table_id)
# insert row -- API call
client.insert_rows(table, [data])
new
# create table ref
dataset_ref = bigquery.dataset.DatasetReference(project=PROJECT_ID, dataset_id=DATASET_ID)
table_ref = bigquery.table.TableReference(dataset_ref=dataset_ref, table_id=TABLE_ID)
# specify table schema
schema = [
bigquery.schema.SchemaField(name='field1', field_type='TIMESTAMP', mode='NULLABLE'),
bigquery.schema.SchemaField(name='field2', field_type='STRING', mode='NULLABLE')
]
# create table instance
table = bigquery.table.Table(table_ref=table_ref, schema=schema)
# insert row -- API call
client.insert_rows(table, [data])