1
votes

I have two related tables (unnecessary columns not listed):

LOCATION

VENUE_ID - NUMBER(38,0)

VISIT

ID - NUMBER(38,0)
VENUE_ID - NUMBER(38,0)
DEVICE_ID - VARCHAR(16777216)

The tables are related such that visits are associated with a location via VENUE_ID.

I'm attempting to get the count of unique device ids by location, so I wrote the following query:

SELECT "d"."VENUE_ID"
    , (
      SELECT COUNT(*)
      FROM (
          SELECT DISTINCT "f0"."DEVICE_ID"
          FROM "MAIN"."VISIT" AS "f0"
          WHERE "d"."VENUE_ID" = "f0"."VENUE_ID"
      ) AS "t")
FROM "MAIN"."LOCATION" AS "d"

Unfortunately, this query resulted in the cryptic error SQL compilation error: Unsupported subquery type cannot be evaluated.

Through a bit of experimentation, I've found that I can get the query to return without error, but only if I add an additional (useless) subquery prior to the existing one in the SELECT:

SELECT "d"."VENUE_ID"

    -- New Useless Subquery
    , (
      SELECT COUNT(*)
      FROM "MAIN"."VISIT" AS "f"
      WHERE "d"."VENUE_ID" = "f"."VENUE_ID")
    --

    , (
      SELECT COUNT(*)
      FROM (
          SELECT DISTINCT "f0"."DEVICE_ID"
          FROM "MAIN"."VISIT" AS "f0"
          WHERE "d"."VENUE_ID" = "f0"."VENUE_ID"
      ) AS "t")
FROM "MAIN"."LOCATION" AS "d"

If I move the new subquery to anywhere in the select after the distinct subquery, the error returns. I've reviewed the documentation on subqueries in Snowflake and either I am not understanding how that applies to my query here or I'm facing undocumented behavior. Anyone have any idea what's going on here?

2
You could avoid correlated subqeury by using LEFT JOIN SELECT "d"."VENUE_ID", COUNT(DISTINCT "f0"."DEVICE_ID") FROM "MAIN"."LOCATION" AS "d" LEFT JOIN "MAIN"."VISIT" AS "f0" ON "d"."VENUE_ID" = "f0"."VENUE_ID" GROUP BY "d"."VENUE_ID", though interesting caseLukasz Szozda
It resembles scenario I've encountered. Rewriting query is easy(and I guess most answer will focus on that). I am also curious what are EXACT rules that optimizer does that allow one query and forbid another.Lukasz Szozda
This is a good solution to the problem I'm facing, but like I stated in the OP, I already have a way to retrieve the data I need. I'm looking for an explanation as to what's going on more than a solution to my specific requirements. Is this a bug in Snowflake I've found?GravlLift

2 Answers

1
votes

I think you're making this more complex than this needs to be. Below should be all you need:

SELECT l.venue_id
  , count(distinct v.device_id)
FROM location l
LEFT JOIN visit v
 on l.venue_id = v.venue_id
GROUP BY l.venue_id
0
votes

The answer is a little cryptic, but what happens is this:

You are asking for ONE value and you need to guarantee that only ONE value is returned by your subquery. A distinct clause cannot guarantee that. In some databases that will work as long as the data returns one row, but the moment you get two rows then the database will throw an error.

Snowflake is strict on its subquery analysis. So you need to use a subquery that is guarantee to return always one value, for example select sum(..), select count(..)