Using a configuration identic to the one used in the Terraform example: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/glue_catalog_table
resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
name = "MyCatalogTable"
database_name = "MyCatalogDatabase"
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
"parquet.compression" = "SNAPPY"
}
storage_descriptor {
location = "s3://my-bucket/event-streams/my-stream"
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "my-stream"
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
parameters = {
"serialization.format" = 1
}
}
}
}
and then trying to run a simple Athena query on the created table fails with the error
Not valid Parquet file
I've used every SerDe definition available: https://docs.aws.amazon.com/athena/latest/ug/supported-serdes.html And all the input_formats I could find, and nothing works.
Trying it with a Parquet file instead of a Snappy file does seem to work, but that doesn't fit my needs. Anyone ever had this working with Snappy files?