Kinesis Firehose to S3 in Parquet format (and Snappy compression)

Question

I have set up AWS DMS and taken data from my SQLServer RDS into Kinesis Streams and then into Kinesis Firehose and now want to write the data into S3 in Parquet format compressed (Snappy). On the Console for Firehose it says I need to specify a schema for the source records. Is this necessary?...if so how do I take the source data definition and create a Glue Catalog to use at this point? Also, if I add to the table columns with additional optional columns that Kinesis Streams can add (eg the time of the transaction), will these need defining in the catalog also?

Ritesh Grandhi Ritesh Grandhi · Accepted Answer · 2020-11-04T10:57:46

On the Console for Firehose it says I need to specify a schema for the source records. Is this necessary?

Yes. Parquet is columnar in nature. Kinesis needs to know the schema to convert to Parquet format.

if I add to the table columns with additional optional columns that Kinesis Streams can add (eg the time of the transaction), will these need defining in the catalog also?

Yes. All columns you send that are not defined in schema will be ignored and lost. They need to be defined in the schema

Kinesis Firehose to S3 in Parquet format (and Snappy compression)

1 Answers