4
votes

I have a Pandas dataframe that contains several columns that are lists. I'd like to insert them into my Snowflake table as ARRAY data types.

Traditionally, I've worked with PostgreSQL, and simply inserted as df_to_insert.to_sql(TABLE_NAME, sqlalchemy_connection, **other_parameters).

However, I'm struggling to insert into Snowflake. When I have a column that is a list, the sqlAlchemy Snowflake dialect believes it is a string:

Expression type does not match column data type, expecting ARRAY but got VARCHAR(2) for column NAME_OF_COLUMN_WITH_LISTS

I've tried given the sqlAlchemy engine a hint that the column is not a string by adding in explicit dtypes:

df_to_insert.to_sql("MY_SNOWFLAKE_TABLE_NAME",
                     snowflake_connection,
                     if_exists="append",
                     index=False,
                     dtype={'NAME_OF_COLUMN_WITH_LISTS': sqlalchemy.types.JSON})

This only gets me:

sqlalchemy.exc.StatementError: (builtins.AttributeError) 'SnowflakeDialect' object has no attribute '_json_serializer'

What's the best way to insert data into Snowflake tables of variant / array types and still use pandas / sqlalchemy? Or is there no good way to do this yet?

1
Have you tried sqlalchemy.types.ARRAY ?ResidentSleeper

1 Answers

1
votes

A straight forward way to work around this would be to:

  1. Write the dataframe to a local text file.
  2. Use PUT command to upload to a Snowflake STAGE
  3. run create or replace table as select from stage and cast the fields into their proper types in the select statement.