0
votes

This is the SQL query I am running against a public dataset:

SELECT
  package,
  COUNT(*) count
FROM (
  SELECT
    REGEXP_EXTRACT(line, '(.*)') package,
    id
  FROM (
    SELECT
      SPLIT(content, '\n') line,
      id
    FROM
      [bigquery-public-data:github_repos.contents]
    WHERE
      sample_path LIKE '%.bashrc' OR sample_path LIKE '%.bash_profile')
  GROUP BY
    package,
    id )
GROUP BY
  1
ORDER BY
  count DESC
LIMIT
  400;

and this is the error message:

Error: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

bigquery-public-data:github_repos.contents is too large for my quota.

bigquery-public-data:github_repos.sample_contents is too small for what I'm analyzing.

Is there any way to specify how much quota a query can utilize? For example, if I have a 1TB quota, is there a way to run this query against github_repos.contents (which would consume 2.15TB), but stop processing after consuming 1TB?

1
This is not possible AFAIK. The table is simply too big for the free tier.Graham Polley
@GrahamPolley would matching fewer files in the WHERE clause reduce the quota consumed? Would any query against this dataset consume 2.15TB?Pat Myron
No, because it's neither partitioned or clustered.Graham Polley
@GrahamPolley Okay, so any possible query against this dataset would be 2.15TB?Pat Myron
No, depends on the columns you access in your query. But because this table is primarily made up of the "content" column, then any query that accesses that will be big.Graham Polley

1 Answers

1
votes

You can use Custom Cost Controls. This can be set at project level or user. The user can be a service account. Having different service accounts running each queries you can "specify how much quota a query can utilize".