Correlated sub-query causing this error - LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join

Question

When writing correlated sub-queries in BigQuery using a WHERE clause, it is resulting in an Error. LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

However it works in other SQL engines, I am not understanding the error or how to rewrite the queries.

Different queries of the same type causing the same error.

SELECT
  out.*
FROM
  `august-monolith-205810.Assignment.students` AS out
WHERE
  2 >= (
  SELECT
    COUNT(*)
  FROM
    `august-monolith-205810.Assignment.students` AS t1
  WHERE
    out.age < t1.age)

reporting students who are older than at most two other students.

Don't know bigquery (so not sure if it deviates from standard SQL). Just to say logically I think you need to change "out.age < t1".age to "out.age > t1.age" if you want students older than at most 2 others. Doesn't explain your error though. — Arthur Nicoll
a general rewrite SELECT * FROM (SELECT ..., ROW_NUMBER() OVER(ORDER BY age) AS row_number) AS alias WHERE alias.row_number >= 2 which is believe should give the same results as your co-related subquery because you simulate or emulate a rank or row_number with you co-related subquery. — Raymond Nijland
it is important to carefully pick tags for your question to avoid massive not relevant comments and answers!!! next time if your question is very specific to BigQuery - I would recommend using ONLY bigquery tag and not include sql tag. Obviously you own the question and it is up to you :o) — Mikhail Berlyant

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2019-05-03T15:44:22

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(cnt)
FROM (
  SELECT *, COUNT(1) OVER(ORDER BY age RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) cnt
  FROM `august-monolith-205810.Assignment.students` 
)
WHERE cnt BETWEEN 1 AND 2

Another, less effective and thus not recommended but still an option is to use CROSS JOIN to mimic non-equality LEFT JOIN

#standardSQL
SELECT ANY_VALUE(out).*
FROM `august-monolith-205810.Assignment.students` AS out
CROSS JOIN `august-monolith-205810.Assignment.students` AS t1
WHERE out.age > t1.age 
GROUP BY out.id
HAVING COUNT(t1.age) <= 2

Correlated sub-query causing this error - LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join

2 Answers