I have a lot of Kafka topics with 1 partition each, being produced and consumed to/from (REST API - Kafka - SQL server). But now I want to take periodic dumps of this data to keep in HDFS to perform analytics on later down the line.
Since this basically is just a dump I need, I'm not sure that I need spark streaming. However all documentation and examples use Spark streaming for this.
Is there a way to populate a DF/RDD from a Kafka topic without having a streaming job running? Or is the paradigm here to kill the "streaming" job once the set window of min-to-max offset have been processed? And thus treating the streaming job as a batch job.