1
votes

I have a simple containerised python script which I am trying to parallelise with Kubernetes. This script guesses hashes until it finds a hashed value below a certain threshold.

I am only interested in the first such value, so I wish to create a Kubernetes job that spawns n worker pods and completes as soon as one worker pod finds a suitable value.

By default, Kubernetes jobs wait until all worker pods complete before marking the job as complete. I have so far been unable to find a way around this (no mention of this job pattern in the documentation), and have been relying on checking the logs of bare pods via a bash script to determine whether one has completed.

Is there a native means to achieve this? And, if not, what would be the best approach?

2
Have you already found solution ?Malgorzata
I have not, I am still using my workaround of working with bare pods. Thankslemonzest

2 Answers

0
votes

Hi look this link https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs.

I've never tried it but it seems possible to launch several pods and configure the end of the job when x pods have finished. In your case x is 1.

0
votes

We can define two specifications for parallel Jobs:

1. Parallel Jobs with a fixed completion count:

  • specify a non-zero positive value for .spec.completions.
  • the Job represents the overall task, and is complete when there is one successful Pod for each value in the range 1 to .spec.completions
  • not implemented yet: Each Pod is passed a different index in the range 1 to .spec.completions.

2. Parallel Jobs with a work queue:

  • do not specify .spec.completions, default to .spec.parallelism

  • the Pods must coordinate amongst themselves or an external service to determine what each should work on.

For example, a Pod might fetch a batch of up to N items from the work queue. each Pod is independently capable of determining whether or not all its peers are done, and thus that the entire Job is done.

  • when any Pod from the Job terminates with success, no new Pods are created
  • once at least one Pod has terminated with success and all Pods are terminated, then the Job is completed with success
  • once any Pod has exited with success, no other Pod should still be doing any work for this task or writing any output. They should all be in the process of exiting

For a fixed completion count Job, you should set .spec.completions to the number of completions needed. You can set .spec.parallelism, or leave it unset and it will default to 1.

For a work queue Job, you must leave .spec.completions unset, and set .spec.parallelism to a non-negative integer.

For more information about how to make use of the different types of job, see the job patterns section.

You can also take a look on single job which starts controller pod: This pattern is for a single Job to create a Pod which then creates other Pods, acting as a sort of custom controller for those Pods. This allows the most flexibility, but may be somewhat complicated to get started with and offers less integration with Kubernetes.

One example of this pattern would be a Job which starts a Pod which runs a script that in turn starts a Spark master controller (see spark example), runs a spark driver, and then cleans up.

An advantage of this approach is that the overall process gets the completion guarantee of a Job object, but complete control over what Pods are created and how work is assigned to them.

At the same time take under consideration that completition status of Job set by dafault - when specified number of successful completions is reached it ensure that all tasks are processed properly. Applying this status before all tasks are finished is not secure solution.

You should also know that finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.

Here is official documentations: jobs-parallel-processing , parallel-jobs. Useful blog: article-parallel job.

EDIT:

Another option is that you can create special script which will continuously check values you look for. Using job then will not be necessary, you can simply use deployment.