As far as I know, spark structured streaming is fault Tolerance by using checkpoints.
I want to read from kafka.
So let's say that I use checkpoint, and then for some reason my code crashes / I stops it, then I expect that when I rerun the code it will recover the processed data.
My problem is that in the reading configuration, if I set the offset to earliest so after rerunning the code I will read the same data again, and if I put latest I won't read the data between the code crashes til I rerun the code.
Does there is a way to read only unread messages from kafka with spark 2.3 - structured streaming (pyspark), and to recover processed data from checkpoints?