As a QA in our company I am daily user of kubernetes, and we use kubernetes job to create performance tests pods. One advantage of job, according to the docs, is
to create one Job object in order to reliably run one Pod to completion
But in our tests this feature will create infinite pods if previous ones fail, which will occupy resources of our team's shared cluster, and deleting such pods will take a lot of time. see this image:
Currently the job manifest is like this:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "upgradeperf",
"namespace": "ntg6-grpc26-tts"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "upgradeperfjob",
"image":
"mycompany.com:5000/ncs-cd-qa/upgradeperf:0.1.1",
"command": [
"python",
"/jmeterwork/jmeter.py",
"-gu",
"[email protected]:mobility-ncs-tools/tts-cdqa-tool.git",
"-gb",
"upgradeperf",
"-t",
"JMeter/testcases/ttssvc/JMeterTestPlan_ttssvc_cmpsize.jmx",
"-JtestDataFile",
"JMeter/testcases/ttssvc/testData/avaml_opus.csv",
"-JthreadNum",
"3",
"-JthreadLoopCount",
"1500",
"-JresultsFile",
"results_upgradeperf_cavaml_opus_t3_l1500.csv",
"-Jhost",
"mtl-blade32-03.mycompany.com",
"-Jport",
"28416"
]
}
],
"restartPolicy": "Never",
"imagePullSecrets": [
{
"name": "docker-registry-secret"
}
]
}
}
}
}
In some cases, such as misconfiguring of ip/ports, 'reliably run one Pod to completion' is impossible and recreating pods is waste of time and resource. So is it possible, and how to limit kubernetes job to create a maxium number(say 3) of pods if always fail?