5
votes

When writing correlated sub-queries in BigQuery using a WHERE clause, it is resulting in an Error. LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

However it works in other SQL engines, I am not understanding the error or how to rewrite the queries.

Different queries of the same type causing the same error.

SELECT
  out.*
FROM
  `august-monolith-205810.Assignment.students` AS out
WHERE
  2 >= (
  SELECT
    COUNT(*)
  FROM
    `august-monolith-205810.Assignment.students` AS t1
  WHERE
    out.age < t1.age)

reporting students who are older than at most two other students.

2
Don't know bigquery (so not sure if it deviates from standard SQL). Just to say logically I think you need to change "out.age < t1".age to "out.age > t1.age" if you want students older than at most 2 others. Doesn't explain your error though.Arthur Nicoll
Is the age field an integer, decimal?DRapp
i advice you look into window functionsRaymond Nijland
a general rewrite SELECT * FROM (SELECT ..., ROW_NUMBER() OVER(ORDER BY age) AS row_number) AS alias WHERE alias.row_number >= 2 which is believe should give the same results as your co-related subquery because you simulate or emulate a rank or row_number with you co-related subquery.Raymond Nijland
it is important to carefully pick tags for your question to avoid massive not relevant comments and answers!!! next time if your question is very specific to BigQuery - I would recommend using ONLY bigquery tag and not include sql tag. Obviously you own the question and it is up to you :o)Mikhail Berlyant

2 Answers

4
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(cnt)
FROM (
  SELECT *, COUNT(1) OVER(ORDER BY age RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) cnt
  FROM `august-monolith-205810.Assignment.students` 
)
WHERE cnt BETWEEN 1 AND 2

Another, less effective and thus not recommended but still an option is to use CROSS JOIN to mimic non-equality LEFT JOIN

#standardSQL
SELECT ANY_VALUE(out).*
FROM `august-monolith-205810.Assignment.students` AS out
CROSS JOIN `august-monolith-205810.Assignment.students` AS t1
WHERE out.age > t1.age 
GROUP BY out.id
HAVING COUNT(t1.age) <= 2
0
votes

Horrible thing you are trying to do, but lets think about what you DO have. For every "OUT" record, you are requerying every assignment again to get the count. What I would suggest is a 3-level query. The inner-most is to get all distinct "age" values (hoping integer based). Then get the counts LESS than to create a pre-aggregate result. Then you can join to the original out based on the outer-most OUT value vs repeated querying. Something like

select
      out
   from
      `august-monolith-205810.Assignment.students` AS out
         JOIN ( select
                      UniqAge.Age,
                      ( SELECT COUNT(*) 
                           FROM `august-monolith-205810.Assignment.students`
                           WHERE age < UniqAge.age) CountUnderAge
                   from
                      ( select distinct age
                           from `august-monolith-205810.Assignment.students` ) UniqAge
              ) PQByAge
        on out.age = PQByAge
        AND PQByAge.CountUnderAge <= 2

This query is based on the Age column being INTEGER based (non decimal).