5
votes

What is the best practice for moving data from a Kafka cluster to a Redshift table? We have continuous data arriving on Kafka and I want to write it to tables in Redshift (it doesn't have to be in real time).

  • Should I use Lambda function?
  • Should I write a Redshift connector (consumer) that will run on a dedicated EC2 instance? (downside is that I need to handle redundancy)
  • Is there some AWS pipeline service for that?
1

1 Answers

9
votes

Kafka Connect is commonly used for streaming data from Kafka to (and from) data stores. It does useful things like automagically managing scaleout, fail over, schemas, serialisation, and so on.

This blog shows how to use the open-source JDBC Kafka Connect connector to stream to Redshift. There is also a community Redshift connector, but I've not tried this.

This blog shows another approach, not using Kafka Connect.

Disclaimer: I work for Confluent, who created the JDBC connector.