Doing aggregate functions on queries across standard and variant typed columns

Question

I am looking to leverage the variant capabilities in snowflake. I am new to snowflake and can't see from the docs if it supports what I want to do. I want to perform queries on a table where I do aggregations (groupings) across columns that have standard types, (varchar, int ... ...) and values extracted from a variant column. My variant column will have store a collection of objects that look like:

{ "container": [
  {
    "param1": "value1a",
    "param2": "value2a",
    "param3": "value3a"
  },
  {
    "param1": "value1b",
    "param2": "value2b",
    "param3": "value3b"
  },
  {
    "param1": "value1c",
    "param2": "value2c",
    "param3": "value3c"
  }
]
}

In traditional sql schema to show what I am trying to do let's say i had a 1:n relationship where the container was a child on the many with a fk (parent_id) to specify the rel (say to parent). Here is a type of query i am trying to write on the variant column expressed in traditional sql/relational model:

SELECT P.column1, P.column2, C.param1, C.param2
FROM parent P
INNER JOIN Child c ON c.parent_id = p.id
GROUP BY 1, 2, 3

What have you tried, so far. If nothing, I'd start by looking into the FLATTEN function of Snowflake. It bursts out your arrays into records, which would mimic your 1:n relationship: docs.snowflake.net/manuals/sql-reference/functions/flatten.html — Mike Walton
@DavidGarrison . yep sorry about that. My example variant data has a problem too. Let's say that param1 has a value that is the same across all three objects "value1a" and that param2 has a value, (value1b) that is the same across the first 2 objects. As someone else pointed out, my aggregate function is a simple count(1) as the_count. ``` column1, column2, param1, param2, the_count ---------------------------------------------------------------- siteX, pageY, value1a, value1b,2 siteA,pageB,value1a,value1b,1 siteC,pageC,value1a,value1c,1 ``` — dubbled
uh. the formatting of the reply is bad news. new poster on this site. sorry David G. — dubbled

Simeon Pilgrim Simeon Pilgrim · Accepted Answer · 2019-12-19T19:25:59

You will want to use the FLATTEN command, get access to the parts of your object.

Thus to access the three parameters of the container you would use

WITH input_tab AS (
  select PARSE_JSON('{ "container": [ {    "param1": "value1a",    "param2": "value2a",   "param3": "value3a"  },  {    "param1": "value1b",    "param2": "value2b",    "param3": "value3b"  },  {    "param1": "value1c",    "param2": "value2c",    "param3": "value3c"  }]}') as json
)
SELECT f.value:param1, 
    f.value:param2, 
    f.value:param3
    --,f.* -- here so you can see the other parts.
FROM input_tab AS i,
TABLE(flatten(input => i.json, path => 'container')) f;

Giving:

F.VALUE:PARAM1  F.VALUE:PARAM2  F.VALUE:PARAM3
"value1a"   "value2a"   "value3a"
"value1b"   "value2b"   "value3b"
"value1c"   "value2c"   "value3c"

which you will need to parse into there type data with the TRY_TO_x or AS_x casting functions.

Doing aggregate functions on queries across standard and variant typed columns

2 Answers

Temp Table (for ease of use)

Option 1

Option 2