Insert VARIANT type from Pandas into Snowflake

Question

I'm trying to insert data from a Pandas dataframe into a table in Snowflake, and I'm having trouble figuring out how to do it properly. To start with, I have a created a table in Snowflake that has some columns of type VARIANT. For example:

CREATE OR REPLACE TABLE
    mydatabase.myschema.results(
        DATE date, 
        PRODUCT string, 
        PRODUCT_DETAILS variant, 
        ANALYSIS_META variant,
        PRICE float
)

Then in Pandas, I have a dataframe like this:

import pandas as pd
record = {'DATE': '2020-11-05',
 'PRODUCT': 'blue_banana',
 'PRODUCT_DETAILS': "{'is_blue': True, 'is_kiwi': nan}",
 'ANALYSIS_META': "None",
 'PRICE': 13.02}
df = pd.DataFrame(record, index=[0])

As you see, I've encoded VARIANT columns as strings, as that's what I understood from the snowflake-connector documentation, that a Snowflake VARIANT type maps to str dtype in Pandas and vice-versa.

So, what I've tried to far is the following:

from snowflake.connector import pandas_tools

pandas_tools.write_pandas(
                conn=conn,
                df=df,
                table_name="results",
                schema="myschema",
                database="mydatabase")

And this does work, returning

(True,
 1,
 1,
 [('czeau/file0.txt', 'LOADED', 1, 1, 1, 0, None, None, None, None)])

However, the results I get in Snowflake are not of the proper VARIANT type. The field ANALYSIS_META is correctly NULL, but the field PRODUCT_DETAILS is of type str. See:

(also, for example this query throws an error:

SELECT * FROM
MYDATABASE.MYSCHEMA.RESULTS
WHERE PRODUCT_DETAILS:is_blue -- should work for json/variant fields

So with all that, my question is: how should I properly format my Pandas dataframe in order to insert he VARIANT fields correctly as nested fields into a Snowflake table? I thought that casting a dictionary into a string would do the trick, but apparently it doesn't work as I expected. What I am missing here?

What is the error message you are getting from SELECT * FROM MYDATABASE.MYSCHEMA.RESULTS? — demircioglu
@demircioglu It says SQL compilation error: Invalid data type [VARIANT] for predicate [GET(TESTING.PRODUCT_DETAILS, 'is_blue')] — tania
Try this, should work SELECT * FROM MYDATABASE.MYSCHEMA.RESULTS WHERE parse_json(PRODUCT_DETAILS):is_blue — demircioglu
@demircioglu I'm still getting the same error as above (SQL compilation error: Invalid data type [VARIANT] for predicate [GET(PARSE_JSON(TO_CHAR(TESTING.PRODUCT_DETAILS)), 'is_blue')]), but even if it worked, I'm interested in making sure the type is properly inserted as VARIANT. Some other tables in Snowflake that I can see are correctly inserted as VARIANT type are highlighted blue and you can click on the values when you preview, as well as extract the nested subfields with the : notation. I would like to achieve that without having to cast into json again when selecting. — tania

tania tania · Accepted Answer · 2020-11-06T10:25:32

After some investigation, I found the following solution to work:

1. Ensure that the columns are json-compliant

The key here is that json.dumps will transform your data to the right format (the right quotations, representation of null and such).

import pandas as pd
import json
record = {'DATE': '2020-11-05',
 'PRODUCT': 'blue_banana',
 'PRODUCT_DETAILS': json.dumps({'is_blue': True, 'is_kiwi': None}),
 'ANALYSIS_META': json.dumps(None),
 'PRICE': 13.02}
df = pd.DataFrame(record, index=[0])

2. Ensure you use parse_json and INSERT iteratively

Instead of using write_pandas as tried originally, we can INSERT into the table row by row, making sure to specify parse_json on the columns of desired VARIANT type, while also encoding the value as a string (by putting ' marks around it). The caveat is that this solution would be very slow if you have large amounts of data.

sql = """INSERT INTO MYDATABASE.MYSCHEMA.RESULTS
SELECT
 to_date('{DATE}'),
 '{PRODUCT}',
 parse_json('{PRODUCT_DETAILS}'),
 parse_json('{ANALYSIS_META}'),
 {PRICE}
"""
### CREATE A SNOWFLAKE CONN...

for i, r in df.iterrows():
    conn.cursor().execute(sql.format(**dict(r)))

Insert VARIANT type from Pandas into Snowflake

2 Answers

Problem statement

Proposed solution