0
votes

I am using spark-sql-2.4.1v with java 1.8. and kafka versions spark-sql-kafka-0-10_2.11_2.4.3 and kafka-clients_0.10.0.0.

I need to join streaming data with meta-data which is stored in RDS. but RDS meta data could be added/changed.

If I read and load RDS table data in application , it would be stale for joining with streaming data.

I understood ,need to use Change Data Capture (CDC). How can I implement Change Data Capture (CDC) in my scenario?

any clues or sample way to implement Change Data Capture (CDC) ?

thanks a lot.

1

1 Answers

3
votes

You can stream a database into Kafka so that the contents of a table plus every subsequent change is available on a Kafka topic. From here it can be used in stream processing.

You can do CDC in two different ways:

  • Query-based: poll the database for changes, using Kafka Connect JDBC Source
  • Log-based: extract changes from the database's transaction log using e.g. Debezium

For more details and examples see http://rmoff.dev/ksny19-no-more-silos