0
votes

I have a table in Snowflake where one of the fields, called 'value' is sometimes plain text sometimes JSON, this field is stored as string in Snowflake

I created this view to get only the rows where there is a Json format

CREATE OR REPLACE VIEW tmp_events AS
    SELECT PARSE_JSON(value) as json_data,
           id
    FROM SessionEvent 
    WHERE session_event_type_id=7;

Then I flatten the rows to create a new field

CREATE OR REPLACE VIEW tmp_events_step2 AS
     SELECT id,
            json_data:meta:selected::string AS choice
         from tmp_events ,
     LATERAL FLATTEN(input => tmp_events.json_data)
     WHERE choice IS NOT NULL

Everything runs fine until now, I can preview data from these two views, no error and I get the results I was expecting.

The error comes when I try to get distinct values from choice

 SELECT DISTINCT choice from tmp_events_step2;

Error parsing JSON: unknown keyword "brain", pos 6

This name Brain seems to come from my initial table without the WHERE statement.

If I run the query without DISTINCT there is no error.

Weird thing I noticed while trying to debug: when I put a limit in tmp_events_step2, the code works fine again, even though I put a limit that's bigger than the number of rows in the table

CREATE OR REPLACE VIEW tmp_events_step2 AS
     SELECT id,
            json_data:meta:selected::string AS choice
         from tmp_events ,
     LATERAL FLATTEN(input => tmp_events.json_data)
     WHERE choice IS NOT NULL
     LIMIT 10000000;
SELECT DISTINCT choice from tmp_events_step2;

result table

What's the catch? Why does it work only with the limit?

3

3 Answers

1
votes

The very simple answer to this is the built-in function TRY_PARSE_JSON()

Er, not. You seem to have problems with the Query optimizer that may do incorrect predicate pushdowns. One way to prevent the optimizer from doing this is to use the secure view option:

CREATE SECURE VIEW tmp_events_step2 ...

and file a support ticket...

1
votes

We reported this error two years ago and they said they where not going to fix, because by hoisting the JSON access prior to running the filters in the WHERE clause that makes the cast valid/safe, impacted performance.

create table variant_cast_bug(num number, var variant);

insert into variant_cast_bug
    select column1 as num, parse_json(column2) as var
    from values (1, '{"id": 1}'), 
            (1, '{"id": 2}'),
            (2, '{"id": "text"}')
            v;

select * from variant_cast_bug;
select var:id from variant_cast_bug;
select var:id from variant_cast_bug where num = 1;
select var:id::number from variant_cast_bug where num = 1; -- <- broken
select TRY_TO_NUMBER(var:id) from variant_cast_bug where num = 1; -- <- works

Sometimes you can nest the select and it will work, and then you can add another SELECT layer around it, and do some aggregation and the cost explodes again.

The only two safe solutions are SERCURE VIEW as Hans mentions, but that is a performance nightmare. Or to understand this problem and use TRY_TO_NUMBER or it's friends.

At the time this was made bad worse because JSON boolean values where not valid values to pass to TRY_TO_BOOLEAN..

One of the times we got burnt by this was after a snowflake release when code that had been running for a year, started getting this error, because it was complex enough the hoisting did not impact, and then after release it did. This is where Snowflake are rather responsive, and then rolled the release back, and we put TRY_TO on a chunk of already working SQL just to play it safe.

0
votes

Please submit a support case for this issue.