0
votes

I am looking to leverage the variant capabilities in snowflake. I am new to snowflake and can't see from the docs if it supports what I want to do. I want to perform queries on a table where I do aggregations (groupings) across columns that have standard types, (varchar, int ... ...) and values extracted from a variant column. My variant column will have store a collection of objects that look like:

{ "container": [
  {
    "param1": "value1a",
    "param2": "value2a",
    "param3": "value3a"
  },
  {
    "param1": "value1b",
    "param2": "value2b",
    "param3": "value3b"
  },
  {
    "param1": "value1c",
    "param2": "value2c",
    "param3": "value3c"
  }
]
}

In traditional sql schema to show what I am trying to do let's say i had a 1:n relationship where the container was a child on the many with a fk (parent_id) to specify the rel (say to parent). Here is a type of query i am trying to write on the variant column expressed in traditional sql/relational model:

SELECT P.column1, P.column2, C.param1, C.param2
FROM parent P
INNER JOIN Child c ON c.parent_id = p.id
GROUP BY 1, 2, 3
2
Could you add in an expected query result/output, please?David Garrison
What have you tried, so far. If nothing, I'd start by looking into the FLATTEN function of Snowflake. It bursts out your arrays into records, which would mimic your 1:n relationship: docs.snowflake.net/manuals/sql-reference/functions/flatten.htmlMike Walton
@DavidGarrison . yep sorry about that. My example variant data has a problem too. Let's say that param1 has a value that is the same across all three objects "value1a" and that param2 has a value, (value1b) that is the same across the first 2 objects. As someone else pointed out, my aggregate function is a simple count(1) as the_count. ``` column1, column2, param1, param2, the_count ---------------------------------------------------------------- siteX, pageY, value1a, value1b,2 siteA,pageB,value1a,value1b,1 siteC,pageC,value1a,value1c,1 ```dubbled
uh. the formatting of the reply is bad news. new poster on this site. sorry David G.dubbled

2 Answers

1
votes

You will want to use the FLATTEN command, get access to the parts of your object.

Thus to access the three parameters of the container you would use

WITH input_tab AS (
  select PARSE_JSON('{ "container": [ {    "param1": "value1a",    "param2": "value2a",   "param3": "value3a"  },  {    "param1": "value1b",    "param2": "value2b",    "param3": "value3b"  },  {    "param1": "value1c",    "param2": "value2c",    "param3": "value3c"  }]}') as json
)
SELECT f.value:param1, 
    f.value:param2, 
    f.value:param3
    --,f.* -- here so you can see the other parts.
FROM input_tab AS i,
TABLE(flatten(input => i.json, path => 'container')) f;

Giving:

F.VALUE:PARAM1  F.VALUE:PARAM2  F.VALUE:PARAM3
"value1a"   "value2a"   "value3a"
"value1b"   "value2b"   "value3b"
"value1c"   "value2c"   "value3c"

which you will need to parse into there type data with the TRY_TO_x or AS_x casting functions.

0
votes

Great question, not sure which aggregate function you intend on using as you don't have it in your example and your values are strings, so your aggregate functions are limited.

Temp Table (for ease of use)

CREATE OR REPLACE TEMP TABLE X
AS
SELECT PARSE_JSON(COLUMN1) AS C1
FROM VALUES 
($$
{ "container": [
  {
    "param1": "value1a",
    "param2": "value2a",
    "param3": "value3a"
  },
  {
    "param1": "value1b",
    "param2": "value2b",
    "param3": "value3b"
  },
  {
    "param1": "value1c",
    "param2": "value2c",
    "param3": "value3c"
  }
]
}$$);

Option 1

SELECT C1
    ,F1.*
    ,F1.VALUE['param2']
FROM X
    ,LATERAL FLATTEN (INPUT => C1['container']) F1
;

Option 2

SELECT C1
    ,F1.*
    ,F2.*
FROM X
    ,LATERAL FLATTEN (INPUT => C1['container']) F1
    ,LATERAL FLATTEN (INPUT => F1.VALUE) F2