I have got a spark structured streaming job which consumes events coming from the azure event hubs service. In some cases it happens, that some batches are not processed by the streaming job. In this case there can be seen the following logging statement in the structured streaming log:
INFO FileStreamSink: Skipping already committed batch 25
the streaming job persists the incoming events into an Azure Datalake, so I can check which events have actually been processed/persisted. When the above skipping happens, these events are missing!
It is unclear to me, why these batches are marked as already committed, because in the end it seems like they were not processed!
Do you have an idea what might cause this behaviour?
Thanks!