0
votes

I am frequently running into problems where Snowflake query optimization seems to break UDFs as soon as I try to invoke them inside more complex queries. To try and find commonality in my missteps, I've distilled to a simple test javascript UDTF (called generate_series) that takes 3 doubles as arguments, and returns a table with a single column also of type double. This is able to run okay:

WITH report_params AS (
  SELECT
    1::double as first_value,
    3::double as last_value,
    1::double AS step_value
)      
SELECT
  *
FROM
  report_params,
  LATERAL generate_series(
    first_value,
    last_value,
    step_value
  )

(Side discovery if I use the LATERAL keyword on the cross join I don't need to encapsulate the UDTF in a table() cast. Not sure if there is rhyme or reason to this?)

Now, if I try to invoke the UDTF using sub-queries as parameters, which seems to be in compliance with the described snowflake subquery limitations, like so:

WITH report_params AS (
  SELECT
    1::double as first_value,
    3::double as last_value,
    1::double AS step_value
)      
SELECT
  *
FROM
  table(
    generate_series(
      (SELECT MAX(first_value) FROM report_params),
      (SELECT MAX(last_value) FROM report_params),
      (SELECT MAX(step_value) FROM report_params)
    )
 )

I get a compilation error:

SQL compilation error: syntax error line 12 at position 49 unexpected ','. syntax error line 15 at position 6 unexpected ')'.

That is problem #1 I have, that fails regardless, though it should be valid SQL. My blind guess is the compiler misinterprets this as a correlated subquery.

Problem #2 is I then have a tiny wrapper UDTF that overloads the arguments and return values as integers (since JS-UDFs can only accept doubles)

CREATE OR REPLACE FUNCTION generate_series(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
    RETURNS TABLE (GS_VALUE INTEGER)
    AS
$$
    SELECT GS_VALUE::INTEGER AS GS_VALUE FROM table(generate_series(FIRST_VALUE::DOUBLE,LAST_VALUE::DOUBLE,STEP_VALUE::DOUBLE))
$$;

Calling this overloaded function directly with constants e.g.

SELECT * FROM table(generate_series(1::integer,3::integer,1::integer)) 

this wrapper UDTF still works okay.

However, when I modify the previous working query (first variation) by just changing the param value types to integer to invoke the wrapper UDTF, e.g.

WITH report_params AS (
  SELECT
    1::integer as first_value,
    3::integer as last_value,
    1::integer AS step_value
)      
SELECT
  *
FROM
    report_params, table(
  generate_series(
    first_value,
    last_value,
    step_value
  )
)

I get the error

SQL compilation error: Unsupported subquery type cannot be evaluated

This error is the same in either the form directly above, or removing the table CAST and using the LATERAL keyword as presented in the first working form.

I have read in past threads that people have removed the SELECT...FROM portion of SQL UDTFs with mixed success to solve the problem, but none of the iterations I tried seemed to work.

It seems a gentle breeze can send the Snowflake SQL compiler on tilt, not sure if there is a debugging mode where I can observe the "deconstructed" form of the query that it's actually trying to run, and see where the problem is.

EDIT: this is the underlying Javascript UDTF referenced in these examples:

CREATE OR REPLACE FUNCTION generate_series(FIRST_VALUE DOUBLE, LAST_VALUE DOUBLE, STEP_VALUE DOUBLE)
    RETURNS TABLE (GS_VALUE DOUBLE)
    LANGUAGE JAVASCRIPT
    AS '{
        processRow: function get_params(row, rowWriter, context){
           if(row.STEP_VALUE === 0) {
            throw new Error("Step cannot be 0");
           }
           else if(row.STEP_VALUE > 0 && row.FIRST_VALUE > row.LAST_VALUE) {
            throw new Error("Positive series is open-ended.");
           }
           else if(row.STEP_VALUE < 0 && row.FIRST_VALUE < row.LAST_VALUE) {
            throw new Error("Negative series is open-ended.");
           }
           else {
            let i = row.FIRST_VALUE;

            while(row.STEP_VALUE > 0 ? i <= row.LAST_VALUE : i >= row.LAST_VALUE) {
                rowWriter.writeRow({GS_VALUE: i});
                i+=row.STEP_VALUE;
            }
           }}
        }';
1
Can you provide the original function definition of generate_series()? Also, a quick sample of what report_params might look like? This helps others try and replicate what you are doing.Mike Walton
@MikeWalton sure, I just added the generate_series() definition: its a pretty basic series iterator with some small checks to make sure the iterator doesn't end up in an infinite loop. The report_params is pretty much as you would see in the CTEs provided in the examples. We have scripts that do a lot of statistic calculations, and at the top have a CTE table of constant numeric coefficients so that when we want to tweak the formulas we don't have to hunt for all existences in the 1000-line computation queryDanny C
In the item you list as the first problem, the function parameters are result sets instead of columns. UDTF parameters need to be columns or scalars. Even though the result set of the subselects is a single column in each case, that won't work. Since they're all coming from report_params, you can just add that to the query and pass the columns to the UDTF in a lateral join.Greg Pavlik
@GregPavlik Thank you thats good to know thats a straight up limitation in any form, and i'll just have to refactor to use joins in those cases.Danny C

1 Answers

0
votes

For the first query, you can't use a subquery as a parameter for a UDTF. You can, however, use another CTE. Something along these lines:

WITH report_params AS (
  SELECT
    1::double as first_value,
    3::double as last_value,
    1::double AS step_value
), report_max AS (
  SELECT
  MAX(first_value) as first_value_max,
  MAX(last_value) as last_value_max,
  MAX(step_value) as step_value_max
  FROM report_params
)      
SELECT *
FROM report_max,
  table(generate_series(first_value_max,last_value_max,step_value_max)
 );

So, looking at the 2nd issue, there appears to be issues in the syntax against the table function. If you modify the 2nd function to something like this:

CREATE OR REPLACE FUNCTION generate_series_int(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
    RETURNS TABLE (GS_VALUE INTEGER)
    AS
$$
    SELECT GS_VALUE::INTEGER AS GS_VALUE 
    FROM (VALUES (first_value, last_value, step_value)), 
         table(generate_series(first_value::double,last_value::double,step_value::double))
$$;

This does a cross join with the values passed in and allows you to cast the variables to double on the way into the double-version of the function.

This then executes without issues:

WITH report_params AS (
  SELECT
    1::INTEGER as first_value,
    3::INTEGER as last_value,
    1::INTEGER AS step_value
)
SELECT *
FROM report_params, table(generate_series_int(first_value,last_value,step_value));