1
votes

When copying a table from snowflake using the snowflake spark connector, the default behavior is to map structured data to spark strings: https://docs.snowflake.net/manuals/user-guide/spark-connector-use.html#from-snowflake-to-spark-sql

For example, given a table in snowflake:

create table schema.table as 
select 
    array_construct('1','a') as array_col,
    object_construct('1','a') as obj_col

And copying this in pyspark

df = snowflake.sql_context.read.format("snowflake url")\
    .options(**snowflake_options)\
    .load()

results in the dataframe:

> df: pyspark.sql.dataframe.DataFrame
>   ARRAY_COL:string
>   OBJ_COL:string

Is there currently a way to override this default behavior within the connector to map snowflake OBJECT to spark StructType and/or snowflake ARRAY to spark MapType?

2

2 Answers

0
votes

I hope that you are well, I think you may be looking for the columnmapping parameter. I am not an expert at this though, so I am just going to share the research I found.

Specifically for Spark table to a Snowflake table https://docs.snowflake.net/manuals/user-guide/spark-connector-overview.html#column-mapping

For snowflake ARRAY to spark MapType: Is the object stored as an array in a Snowflake table? Is this what you were looking for? https://sparkbyexamples.com/spark/spark-sql-map-functions/

0
votes

The "column-mapping" is used to map column name not column type. For snowflake VARIANT/ARRAY/OBJECT column, JSON string is returned. It should be possible to convert the JSON string to a MapType by an UDF.