0
votes

I am trying to load data from parquet file in AWS S3 into snowflake table. But getting the below error. Could you please help.

SQL compilation error: PARQUET file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.

Parquet file schema

 |-- uuid: string (nullable = true)
 |-- event_timestamp: timestamp (nullable = true)
 |-- params: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- value: string (nullable = true)

Here is the sample data. uuid,event_timestamp,params 3f230ea5-dd52-4cf9-bdde-b79201eb1001,2020-05-10 17:06:21.524,[{id=501, type=custom, name=filtering, value=true}, {id=502, type=custom, name=select, value=false}]

snowflake table

create or replace table temp_log (
      uuid string, 
      event_timestamp timestamp, 
      params array);

I am using the below copy command to load data

 copy into temp_log
 from '<<s3 path>>'
 pattern = '*.parquet'
 storage_integration = <<integration object>
 file_format = (
     type = parquet
     compression = snappy
 )
 ;
2

2 Answers

0
votes

This documentation explains how to load parquet data into multiple columns: Loading Parquet

UPDATE

I'm not sure if the comment below is a response to my answer and, if it is, what the relevance of it is? Did you read the document and, if you did, what part of it do you still have questions about?

You need to have your data in a stage (presumably an external stage in your case), or possibly in an external table, and then load from that into your table using "COPY INTO table FROM (SELECT..." with the $1:.. notation allowing you to select the appropriate elements from you parquet structure.

From the documentation:

/* Load the Parquet data into the relational table.                                                           */
/*                                                                                                            */
/* A SELECT query in the COPY statement identifies a numbered set of columns in the data files you are        */
/* loading from. Note that all Parquet data is stored in a single column ($1).                                */
/*                                                                                                            */
/* Cast element values to the target column data type.                                                        */

copy into cities
  from (select
  $1:continent::varchar,
  $1:country:name::varchar,
  $1:country:city.bag::variant
  from @sf_tut_stage/cities.parquet);
0
votes

This issue was resolved after creating table as below

create or replace table temp_log (

     logcontent VARIANT);