I am ingesting data from a file source using structured streaming. I have a checkpoint setup and it works correctly as far as I can tell except I don't understand what will happen in a couple situations. If my streaming app runs for a long time will the checkpoint files just continue to become larger forever or is it eventually cleaned up. And does it matter if it is never cleaned up? It seems that eventually it would become large enough that it would take a long time for the program to parse.
My other question is when I manually remove or alter the checkpoint folder, or change to a different checkpoint folder no new files are ingested. The files are recognized and are added to the checkpoint, but the file is not actually ingested. This has me worried that if somehow the checkpoint folder is altered my ingestion will screw up. I haven't been able to find much information on what the correct procedure is in these situations.