0
votes

I'm converting a data warehouse front-end from a Hadoop back-end to a Snowflake back-end and I need to convert each of its SQL functions to the Snowflake equivalent. One thing I'm having trouble with is handling formulas that produce Infinity/-Infinity or NaN (Not A Number).

In Hadoop, this was a simple matter of checking the formula using IS_INF and IS_NAN. In SQL Server (another back-end we tested), this can be done using SET ARITHABORT OFF/SET ANSI_WARNINGS OFF. The Hadoop method adds additional logic checks to the SQL and the SQL Server method just straight up ignores the errors and returns NULLs instead of throwing an error. I prefer the latter method because it doesn't slow down the queries, but I'm open to any method that will actually work in Snowflake because up to this point I can't find one.

I should note that there are Snowflake Functions that help avoid very specific errors, such as divide by zero (i.e. DIV0 function or NULLIF), but that doesn't help me for two reasons. First, that only takes care of certain issues (div by zero). I need something that will handle any formula that produces Inf/NaN. Second, the formulas in question are often written by end users in the front-end, which means I don't know which part of the formula is the dividend and which is the divisor. The formula could be very complex (including CASE WHEN or other Functions or any number of things), so parsing out the divisor (so I can use NULLIF), or parsing out both the dividend and the divisor (so I can use DIV0) isn't really an option.

I noticed documentation for Snowflake which said that the FLOAT Data Type could handle special values like Infinity and NaN, but I don't see any way to actually produce those values in a formula without throwing an error.

Any help would be appreciated.

1

1 Answers

1
votes

There is (to my knowledge) no builtin convenience functions for Infinite NaN values. Especially not to the degree that you seem to be seeking. I haven't had to handle situations in my own work, where numeric infinites became a problem, however we could achieve at least some of the functionality by creating our own functions. In this case we could likely get away with just using a SQL-UDF while more flexibility is available for Javascript.
Below I've defined 4 functions (IS_INF, IS_NA, IF_NA and IF_INF) for the purpose. Note that these functions take variant input and returns variant output. This is just such that I dont have to overload the function for all the different data types. In practice this is only problematic for varchar and derived types (date, timestamp), and possibly geospatial data, which will have to be cast to VARIANT.

CREATE FUNCTION IS_INF(x VARIANT)
RETURNS boolean
AS 
$$
    iff(try_cast(x::varchar as float) IN ('Inf', -'Inf'), TRUE, FALSE)
$$;
CREATE FUNCTION IS_NA(x VARIANT)
RETURNS boolean
AS 
$$
    iff(try_cast(x::varchar as float) = 'NaN', TRUE, FALSE)
$$;
CREATE FUNCTION IF_NA(x VARIANT, y VARIANT, z VARIANT)
RETURNS VARIANT 
AS 
$$
    iff(try_cast(x::varchar as float) = 'NaN', y, z)
$$;
CREATE FUNCTION IF_INF(x VARIANT, y VARIANT, z VARIANT)
RETURNS VARIANT 
AS 
$$
    iff(try_cast(x::varchar as float) IN ('Inf', -'Inf'), y, z)
$$;

Note that this requires them to be stored in a specific schema, so if one wished to use this in another schema or database selecting the specific schema is necessary. Having a database/schema for this purpose is usually helpful. In addition snowflake did allow for IS_NAN but I could not use this function afterwards.

Example usage

select is_inf(3245); -- false
SELECT is_na('NaN'); -- true
SELECT IF_inf(3245, 'NaN'::float, 4); -- 4
SELECT if_na(3245, TRUE, FALSE); -- false