0
votes

I'm looking for some help understanding this error I'm getting in BigQuery:

LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

I'm trying to use a case statement to alter the rows selected for joining depending on a value in the left table's row. I am doing something similar in some other places and it works, so part of me thinks I may be making a mistake regarding table alias and column names, but I can't figure it out. Here is a minimal example of what I'm trying to do:

WITH t1 AS (
  SELECT "milk" AS dairy,
   1 AS id,
   2 AS other_id

   UNION ALL

   SELECT "yogurt" AS dairy,
   3 AS id,
   4 AS other_id

   UNION ALL

   SELECT "cheese" AS dairy,
   5 AS id,
   6 AS other_id
),

t2 AS (
  SELECT "blue" AS color,
  1 AS id

  UNION ALL

  SELECT "red" AS color,
  4 AS id
)

SELECT
  t1.*, t2
FROM t1
LEFT JOIN t2 ON
  CASE
    WHEN t1.dairy = 'milk' THEN t1.id = t2.id
    WHEN t1.dairy = 'yogurt' THEN t1.other_id = t2.id
  END

The result I would like to see is:

enter image description here

As you can see in the desired result, when the value for dairy is milk, I want the id from t2 to equal the id column in t1, but when the value for dairy is yogurt, I want the id from t2 to equal the other_id column in t1.

I've been searching around for an explanation but can't figure it out. I also tried the solution offered here, but got the same error, which is why I think I am simply messing something up with tables names or aliases.

Please help!

UPDATE

I was able to get rid of the error by rewriting the case statement this way:

SELECT
  t1.*, t2
FROM t1
LEFT JOIN t2 ON
  CASE
    WHEN t1.dairy = 'milk' THEN t1.id
    WHEN t1.dairy = 'yogurt' THEN t1.other_id
  END = t2.id

However, in my real problem I need to join a third table in a similar fashion. If t2.color is blue, I want to join based on t2.id = t3.id, but if t2.color is red I want to join based on t2.id = t3.other_id. As soon as I do that, the same error occurs. Here is the full example of my attempt:

WITH t1 AS (
  SELECT "milk" AS dairy,
   1 AS id,
   2 AS other_id

   UNION ALL

   SELECT "yogurt" AS dairy,
   3 AS id,
   4 AS other_id

   UNION ALL

   SELECT "cheese" AS dairy,
   5 AS id,
   6 AS other_id
),

t2 AS (
  SELECT "blue" AS color,
  1 AS id

  UNION ALL

  SELECT "red" AS color,
  4 AS id
),

t3 AS (
  SELECT "sunny" AS weather,
  1 AS id,
  10 AS other_id

  UNION ALL

  SELECT "cloudy" AS weather,
  11 AS id,
  4 AS other_id
)

SELECT
  t1.*, t2, t3
FROM t1
LEFT JOIN t2 ON
  CASE
    WHEN t1.dairy = 'milk' THEN t1.id
    WHEN t1.dairy = 'yogurt' THEN t1.other_id
  END = t2.id
LEFT JOIN t3 ON
  CASE
   WHEN t2.color = 'blue' THEN t3.id
   WHEN t2.color = 'red' THEN t3.other_id
  END = t2.id

But now the same error occurs:

LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

If I remove the joining of t3, it works correctly. Here are some more images of the tables and desired result in case it helps:

enter image description here

2
Only a guess, but I would assume BQ doesn't like conditional joins because it can't accurately estimate the cost of your query (especially if your right-side table has partitioning/clustering). The solution provided below works, but another solution using basic principles would be to do 2 separate joins on each ID and then use a case statement in your select statement.rtenha
@rtenha but why does it work with one conditional join? Something isn't making sense.flyingL123
Your scenario with 1 conditional join (the updated version) actually satisfies my theory above. The join is saying it will definitely need to scan t1 and t2. Your original join with t2 inside the case statement doesn't define how much of t2 will be scanned.rtenha
@rtenha but what about the one with two joins? Shouldn’t that work too if your theory was correct?flyingL123
it is the same issue. t3 is inside the case statement, making it conditional. It is unable to estimate how much of t3 it will need to scan.rtenha

2 Answers

1
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT *,
  ARRAY(
    SELECT AS STRUCT *  
    FROM t2 b
    WHERE b.id IN (a.id, a.other_id) 
    ORDER BY (
      CASE
        WHEN dairy IN ('milk', 'yogurt') THEN 1
        ELSE 2
      END    
    )
    LIMIT 1
  )[SAFE_OFFSET(0)] AS t2  
FROM t1 a  

If to apply to sample/dummy data from your question - result is

Row dairy   id  other_id    t2.color    t2.id    
1   milk    1   2           blue        1    
2   yogurt  3   4           red         4    
3   cheese  5   6           
1
votes

I was able to answer your updated question with 3 tables by breaking the joins and associated logic into separate CTEs.

WITH t1 AS (
  SELECT "milk" AS dairy, 1 AS id, 2 AS other_id UNION ALL
  SELECT "yogurt", 3, 4 UNION ALL
  SELECT "cheese", 5, 6
),
t2 AS (
  SELECT "blue" AS color, 1 AS id UNION ALL
  SELECT "red", 4
),
t3 AS (
  SELECT "sunny" AS weather, 1 as id, 10 as other_id UNION ALL
  SELECT "cloudy", 11, 4
),
join_t1_t2 as (
  select
    t1.*,
    case 
      when t1.dairy = 'milk' then milk.color
      when t1.dairy = 'yogurt' then yogurt.color
      else null
    end as t2_color,
    case 
      when t1.dairy = 'milk' then milk.id
      when t1.dairy = 'yogurt' then yogurt.id
      else null
    end as t2_id
  from t1
  left join t2 milk on t1.id = milk.id
  left join t2 yogurt on t1.other_id = yogurt.id
),
join_t1_t2_t3 as (
  select
    join_t1_t2.*,
    case 
      when t2_color = 'blue' then blue.id
      when t2_color = 'red' then red.id
      else null
    end as t3_id,
    case 
      when t2_color = 'blue' then blue.other_id
      when t2_color = 'red' then red.other_id
      else null
    end as t3_other_id,
    case 
      when t2_color = 'blue' then blue.weather
      when t2_color = 'red' then red.weather
      else null
    end as t3_weather,
  from join_t1_t2
  left join t3 blue on t2_id = blue.id
  left join t3 red on t2_id = red.other_id
)
select * from join_t1_t2_t3