This morning I learned about the (unfortunate) default in kubernetes of all previously run cronjobs' jobs instances being retained in the cluster. Mea culpa for not reading that detail in the documentation. I also notice that deleting jobs (kubectl delete job [<foo> or --all]
) takes quite a long time. Further, I noticed that even a reasonably provisioned kubernetes cluster with three large nodes appears to fail (get timeouts of all sorts when trying to use kubectl) when there are just ~750 such old jobs in the system (plus some other active containers that otherwise had not entailed heavy load) [Correction: there were also ~7k pods associated with those old jobs that were also retained :-o]. (I did learn about the configuration settings to limit/avoid storing old jobs from cronjobs, so this won't be a problem [for me] in the future.)
So, since I couldn't find documentation for kubernetes about this, my (related) questions are:
- what exactly is stored when kubernetes retains old jobs? (Presumably it's the associated pod's logs and some metadata, but this doesn't explain why they seemed to place such a load on the cluster.)
- is there a way to see the resources (disk only, I assume, but maybe there is some other resource) that individual or collective old jobs are using?
- why does deleting a kubernetes job take on the order of a minute?