Over the past few years we have developed quite some Spark Streaming (Direct API) applications that are reading or writing to/from Kafka, IBM MQ, Hive, HBase, HDFS, and others on our Cloudera Platform. Now that the Direct API of Spark Streaming (we currently have version 2.3.2) is deprecated and we recently added the Confluent platform (comes with Kafka 2.2.0) to our project we plan to migrate these applications.
What is the natural replacement of our Spark Streaming applications? Should we migrate to Spark Structured Streaming or rather to Kafka Streams?
I personally do not have any experience with both frameworks but in my view Spark Structured Streaming seems to be the natural choice. Our code base is mainly written in Scala which could be also used for the Structured API. Kafka Streams has a few limitations with Scala. Although we might loose some flexibility by leaving the low level API of RDDs and moving to a higher level of DataFrames we could build on our knowledge with Spark.
On the other side there is Kafka Streams which is probably the best choice when it comes to processing data between Kafka topics which is our main use case. And looking at all the Kafka Connectors that come with Confluent the other uses cases can be served as well.