I am new to Snowflake, but my company has been using it successfully.
Parquet files are currently being written with an existing Avro Schema, using Java parquet-avro v1.10.1.
I have been updating the dependencies in order to use latest Avro, and part of that bumped Parquet to 1.11.0.
The Avro Schema is unchanged. However when using the COPY INTO Snowflake command, I receive a LOAD FAILED with error: Error parsing the parquet file: Logical type Null can not be applied to group node but no other error details :(
The problem is that there are no null columns in the files.
I've cut the Avro schema down, and found that the presence of a MAP type in the Avro schema is causing the issue.
The field is
{
"name": "FeatureAmounts",
"type": {
"type": "map",
"values": "records.MoneyDecimal"
}
}
An example of the Parquet schema using parquet-tools.
message record.ResponseRecord {
required binary GroupId (STRING);
required int64 EntryTime (TIMESTAMP(MILLIS,true));
required int64 HandlingDuration;
required binary Id (STRING);
optional binary ResponseId (STRING);
required binary RequestId (STRING);
optional fixed_len_byte_array(12) CostInUSD (DECIMAL(28,15));
required group FeatureAmounts (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key (STRING);
required fixed_len_byte_array(12) value (DECIMAL(28,15));
}
}
}
The 2 files I have, written in parquet 1.10.1 and 1.11.0 output this identical schema.
I have also tried with a bigger schema example, and it appears everything works fine if there is no "map" avro type present in the schema. I have other massive files with huge schemas, many union types that convert to groups in parquet, but all are written and read successfully when they don't contain any "map" types.
But as soon as I add back the "map" type then I get that weird error message from Snowflake when trying to ingest the 1.11.0 version (however 1.10.1 version will load successfully). But parquet-tools with 1.11.0, 1.10.1 etc can still read the files.
I understand that from this comment that there are changes to the Logical Types in Parquet 1.11.0, but that it is supposed to be compatibile still for old versions to read.
But does anyone know what version of Parquet is used by Snowflake to parse these files? Is there something else that could be going on here?
Appreciate any assistance