0
votes

I got this error message when trying execute this SQL query into Google Big Query.

Error: 37.1 - 38.24: The JOIN operator's right-side table must be a small table. Switch the tables if the left-side table is smaller, or use JOIN EACH if both tables are larger than the maximum described at https://cloud.google.com/bigquery/docs/reference/legacy-sql#joins

SQL query

SELECT 
  a.login,
  e.CommitCommentEvent,
  e.CreateEvent,
  e.DeleteEvent,
  e.DeploymentEvent,
  e.DeploymentStatusEvent,
  e.DownloadEvent,
  e.FollowEvent,
  e.ForkEvent,
  e.ForkApplyEvent,
  e.GistEvent,
  e.GollumEvent,
  e.IssueCommentEvent,
  e.IssuesEvent,
  e.MemberEvent,
  e.MembershipEvent,
  e.PageBuildEvent,
  e.PublicEvent,
  e.PullRequestEvent,
  e.PullRequestReviewCommentEvent,
  e.PushEvent,
  e.ReleaseEvent,
  e.RepositoryEvent,
  e.StatusEvent,
  e.TeamAddEvent,
  e.WatchEvent
FROM (
  SELECT actor.login AS login,
    IFNULL(SUM(IF(type='CommitCommentEvent',1,NULL)),0) AS CommitCommentEvent,
    IFNULL(SUM(IF(type='CreateEvent',1,NULL)),0) AS CreateEvent,
    IFNULL(SUM(IF(type='DeleteEvent',1,NULL)),0) AS  DeleteEvent,
    IFNULL(SUM(IF(type='DeploymentEvent',1,NULL)),0) AS  DeploymentEvent,
    IFNULL(SUM(IF(type='DeploymentStatusEvent',1,NULL)),0) AS  DeploymentStatusEvent,
    IFNULL(SUM(IF(type='DownloadEvent',1,NULL)),0) AS  DownloadEvent,
    IFNULL(SUM(IF(type='FollowEvent',1,NULL)),0) AS  FollowEvent,
    IFNULL(SUM(IF(type='ForkEvent',1,NULL)),0) AS  ForkEvent,
    IFNULL(SUM(IF(type='ForkApplyEvent',1,NULL)),0) AS  ForkApplyEvent,
    IFNULL(SUM(IF(type='GistEvent',1,NULL)),0) AS  GistEvent,
    IFNULL(SUM(IF(type='GollumEvent',1,NULL)),0) AS  GollumEvent,
    IFNULL(SUM(IF(type='IssueCommentEvent',1,NULL)),0) AS  IssueCommentEvent,
    IFNULL(SUM(IF(type='IssuesEvent',1,NULL)),0) AS  IssuesEvent,
    IFNULL(SUM(IF(type='MemberEvent',1,NULL)),0) AS  MemberEvent,
    IFNULL(SUM(IF(type='MembershipEvent',1,NULL)),0) AS  MembershipEvent,
    IFNULL(SUM(IF(type='PageBuildEvent',1,NULL)),0) AS  PageBuildEvent,
    IFNULL(SUM(IF(type='PublicEvent',1,NULL)),0) AS  PublicEvent,
    IFNULL(SUM(IF(type='PullRequestEvent',1,NULL)),0) AS  PullRequestEvent,
    IFNULL(SUM(IF(type='PullRequestReviewCommentEvent',1,NULL)),0) AS  PullRequestReviewCommentEvent,
    IFNULL(SUM(IF(type='PushEvent',1,NULL)),0) AS  PushEvent,
    IFNULL(SUM(IF(type='ReleaseEvent',1,NULL)),0) AS  ReleaseEvent,
    IFNULL(SUM(IF(type='RepositoryEvent',1,NULL)),0) AS  RepositoryEvent,
    IFNULL(SUM(IF(type='StatusEvent',1,NULL)),0) AS  StatusEvent,
    IFNULL(SUM(IF(type='TeamAddEvent',1,NULL)),0) AS  TeamAddEvent,
    IFNULL(SUM(IF(type='WatchEvent',1,NULL)),0) AS  WatchEvent
  FROM (
    TABLE_DATE_RANGE([githubarchive:day.events_], 
      DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR"),
      CURRENT_TIMESTAMP()
    )) AS events
  WHERE type IN ("CommitCommentEvent","CreateEvent","DeleteEvent","DeploymentEvent","DeploymentStatusEvent","DownloadEvent","FollowEvent",
  "ForkEvent","ForkApplyEvent","GistEvent","GollumEvent","IssueCommentEvent","IssuesEvent","MemberEvent","MembershipEvent","PageBuildEvent",
  "PublicEvent","PullRequestEvent","PullRequestReviewCommentEvent","PushEvent","ReleaseEvent","RepositoryEvent","StatusEvent","TeamAddEvent",
  "WatchEvent") 
  GROUP BY 1
) AS e
JOIN [githubuser.malaysia] AS a
ON e.login = a.login
1

1 Answers

2
votes

Change your JOIN line to use JOIN EACH.

Explicitly:

JOIN EACH [githubuser.malaysia] AS a

BigQuery handles joins for small tables differently than for large tables, and the query language currently forces you to choose between them.

Note to the BQ team: It would be nice to have the query planner automatically switch sides (when possible), or "upgrade" to JOIN EACH when necessary, and include a query option to do so, e.g. "Automatically optimize joins" that is on by default.