3
votes

I pretty new to Google BigQuery and only mildly comfortable with SQL and I was wonder if you guys could help me reformat my SQL statement maybe to reduce my usage? Because with my current set-up I encounter this error:

Error: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

My query is as follows:

SELECT
  LANGUAGE AS name,
  year,
  quarter,
  count
FROM (
  SELECT
    *
  FROM (
    SELECT
      lang AS language,
      y AS year,
      q AS quarter,
      type,
      COUNT(*) AS count
    FROM (
      SELECT
        a.type type,
        b.lang lang,
        a.y y,
        a.q q
      FROM (
        SELECT
          type,
          YEAR(created_at) AS y,
          QUARTER(created_at) AS q,
          STRING(REGEXP_REPLACE(repo.url, r'(https:\/\/api\.github\.com\/repos\/)', '')) AS name
        FROM
          [githubarchive:year.2016] ) a
      JOIN (
        SELECT
          repo_name AS name,
          lang
        FROM (
          SELECT
            *
          FROM (
            SELECT
              *,
              ROW_NUMBER() OVER (PARTITION BY repo_name ORDER BY lang) AS num
            FROM (
              SELECT
                repo_name,
                FIRST_VALUE(language.name) OVER (PARTITION BY repo_name ORDER BY language.bytes DESC) AS lang
              FROM
                [bigquery-public-data:github_repos.languages]))
          WHERE
            num = 1
          ORDER BY
            repo_name)
        WHERE
          lang != 'null') b
      ON
        a.name = b.name)
    GROUP BY
      type,
      language,
      year,
      quarter
    ORDER BY
      year,
      quarter,
      count DESC)
  WHERE
    count >= 1000)
WHERE
  type = 'PushEvent'
LIMIT
  100

Basically I am attempting to build data-set of all the top 100 languages in terms of "pushes" on Git-Hub and the use D3 to visualize said data. I have used very little Data so far, but this one Query is currently 20gb, which should be below the limit.

As a Student I doubt I can afford to pay for the service.

1

1 Answers

3
votes

The query in question scans just 22.5GB which is about $0.11
The error is saying that you exceeded your free tier allowed bytes - which is 1TB So you can run your query about 45 times within the month after which you need to wait next month

My recommendation to you is not to run this query each and every time - but rather save result and use it in your experimentation / attempts, so yo are not wasting your 1TB that quickly!