0
votes

I have the following function to run a BigQuery data extraction (see below). When I send too many request, I receive the error:

google.api_core.exceptions.Forbidden: 403 Exceeded rate limits: too many concurrent queries for this project_and_region. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

I am wondering why my code is unable to catch the Forbidden error as I explicitly wrote the function to catch the 403?


from google.cloud import bigquery
from google.api_core.exceptions import Forbidden, InternalServerError, ServiceUnavailable

def run_job(query, query_params, attempt_nb=1):

    # Configure
    job_config = bigquery.QueryJobConfig()
    job_config.query_parameters = query_params
    query_job = client.query(
        query,
        # Location must match that of the dataset(s) referenced in the query.
        location='US',
        job_config=job_config)  # API request - starts the query

    # Try to run and transform to DataFrame()
    try:
        df = query_job.to_dataframe()
        assert query_job.state == 'DONE'
        return df

    except Forbidden:
        # Exception mapping a ``403 Forbidden`` response."""
        return retry_job(query, query_params, attempt_nb)

    except InternalServerError:
        # Exception mapping a ``500 Internal Server Error`` response. or a :attr:`grpc.StatusCode.INTERNAL` error."""
        return retry_job(query, query_params, attempt_nb)

    except ServiceUnavailable:
        # Exception mapping a ``503 Service Unavailable`` response or a :attr:`grpc.StatusCode.UNAVAILABLE` error."""
        return retry_job(query, query_params, attempt_nb)


def retry_job(query, query_params, attempt_nb):
    # If the error is a rate limit or connection error, wait and
    # try again.
    # 403: Forbidden: Both access denied and rate limits.
    # 408: Timeout
    # 500: Internal Service Error
    # 503: Service Unavailable
    # Old way: if err.resp.status in [403, 408, 500, 503]:
    if attempt_nb < 3:
        print(' ! New BigQuery error. Retrying in 10s')
        time.sleep(10)
        return run_job(query, query_params, attempt_nb + 1)
    else:
        raise Exception('BigQuery error. Failed 3 times', query)
1
Does the traceback tell you the line number? I'd be curious if it's coming from inside your try block or at the point when you call client.queryNeil C. Obremski
Its coming from the line: df = query_job.to_dataframe() within the tryJohnAndrews

1 Answers

2
votes

The exception is most likely being raised by the following line rather than inside the try block. The to_dataframe() may seem to be the culprit if a retry is happening and the exception is thrown again during recursion.

query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location='US',
    job_config=job_config)  # API request - starts the query

Looking at the source for this library the query() method calls the POST to create a job which is where rateLimitExceeded is checked according to Google's page on Troubleshooting errors:

This error returns if your project exceeds the concurrent rate limit or the API requests limit by sending too many requests too quickly.

You could further test this by adding logging around the calls and/or put the query() call in the try block to see if that solves the issue.