0
votes

I deployed a flask endpoint to the Google AppEngine. It receives a payload, parses it, and sends the data to two different bigquery tables.

My setup: I'm using python 3.7, flask, the google AppEngine, and the google.cloud.bigquery Client() class.

In a nutshell, this is what happens:

- endpoint is called
- endpoint gets payload and parses it
- sends data (1 row) to first bigquery table using client.insert_rows_json(table, [data])
- sends data (1 row) to second bigquery table using client.insert_rows(table, [data])

The error(s):

Reading the appengine log of my service I found this error:

https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target return target() File "/env/lib/python3.7/site-packages/google/cloud/_http.py", line 423, in api_request raise exceptions.from_http_response(response) google.api_core.exceptions.Forbidden: 403 GET https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT_NAME/datasets/DATASET_NAME/tables/TABLE_NAME: Exceeded rate limits: too many api requests per user per method for this user_method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

My hypothesis is that two calls to the BQ API exceed the user-method limit, is it plausible? Since both Client() methods use the same API method, insertAll, which is a streaming method, I doubt it's a matter of quotas.

Another issue I found, is that the table filled by the first call gets ~2% fewer entries than the table filled by the second call (which is the one for which I noticed the error). What could be the reason?

I already checked some payloads not present in the first table and they are legit so I don't think it's a matter of their format not matching the table schema.

EDIT: solved

I ended up following shollyman's advice to drop get_table and instead pass to insert_rows() or insert_rows_json() a Table object initialized with a TableReference and a schema as a list of SchemaField instances.

Code:

from google.cloud import bigquery
client = bigquery.Client()

old

# get table info -- API call
table_id = '{}.{}.{}'.format(PROJECT_ID, DATASET_ID, TABLE_ID)
table = client.get_table(table_id)
# insert row -- API call
client.insert_rows(table, [data])

new

# create table ref
dataset_ref = bigquery.dataset.DatasetReference(project=PROJECT_ID, dataset_id=DATASET_ID)
table_ref = bigquery.table.TableReference(dataset_ref=dataset_ref, table_id=TABLE_ID)
# specify table schema
schema = [
        bigquery.schema.SchemaField(name='field1', field_type='TIMESTAMP', mode='NULLABLE'),
        bigquery.schema.SchemaField(name='field2', field_type='STRING', mode='NULLABLE')
              ]
# create table instance 
table = bigquery.table.Table(table_ref=table_ref, schema=schema)
# insert row -- API call
client.insert_rows(table, [data])
1

1 Answers

0
votes

This is indicating that your code is getting table metadata too aggressively via the tables.get API. Do you do something like reload the table stats on every insert, or fetch schema, etc? Streaming inserts allows many orders of magnitude more QPS than retrieving table metadata.