1
votes

I am using Kinesis Firehose to consume Dyanamo DB streams through lambda and pushing those records to S3 bucket, Glue job is running every hour to pick the records from S3 , perform deduplication and then finally insert the records to Redshift.

enter image description here

Is there any way I can consume the records from Dynamo Streams to 'Kinesis Data Analytics' and then perform deduplication here and insert the records in Redshift?

I have gone through some links https://issues.apache.org/jira/browse/FLINK-4582 , Consume DynamoDB streams in Apache Flink.

  • Here it is mentioned that we can use FlinkKinesisConsumer to consume DynamoDB streams

. So Can we use this FlinkKinesisConsumer in Kinesis Data Analytics and then consume the Dynamo Stream directly?

1
I am not sure if I understand correctly, you want to de-duplicate in Kinesis Data Analytics not in AWS Glue? I think this is answered here: stackoverflow.com/questions/35599069/…B. Pesevski
@B.Pesevski, I am looking if Firehose and Glue both can be replaced by 'Kinesis Data Analytics'Vicky

1 Answers

1
votes

While using Flink as Runtime for Kinesis Data Analytics.

sources : https://docs.aws.amazon.com/kinesisanalytics/latest/java/how-sources.html

'FlinkKinesisConsumer' can be used to adapt the Dynamo DB Streams (https://issues.apache.org/jira/browse/FLINK-4582).

destinations: https://docs.aws.amazon.com/kinesisanalytics/latest/java/how-sinks.html

'FlinkKinesisFirehoseProducer ' can be used to write into 'Kinesis data firehose'. There is no direct integration with Redshift.