2
votes

I have setup a batch environment with

  1. Managed Compute environment
  2. Job Queue
  3. Job Definitions

The actual job(docker container) does a lot of video encoding and hence uses up most of the CPU. The process itself takes a few minutes (close to 5 minutes to get all the encoders initialized). Ideally I would want one job per instance so that the encoders are not CPU starved.

My issue is when I launch multiple jobs at the same time or close enough, AWS batch decides launch both of them in the same instance as the first container is still initializing and has not started using CPUs yet. It seems like a race condition to me where both jobs see the instance created as available.

Is there a way I can launch one instance for each job without looking for instances that are already running? Or any other solution to lock an instance once it is designated for a particular job?

Thanks a lot for your help.

1
Aws batch and ECS shouldn't have any trouble scheduling your containers as long as you're configuring things properly. Are you reserving vcpus for your containers in the job definition? What are you setting your compute environment's min/max/desired vcpus to? Are you letting aws batch decide which instance types to use?Ngenator
Hi @ngenator, the task requires 3 different kinds of VCPUs depending on the kind of encoding. most of the times it is 16 vcpu and some require more than that. So I override the environment property of the aws batch when launching a job. below is my configuration ``` Minimum vCPUs 0 Desired vCPUs 0 Maximum vCPUs 256 Instance types c5 ```Guru Govindan
Ok, when you say you're overriding the environment property, are you talking about in the job definition's containerProperties? Are you setting the vcpus there? The environment property is for the container's environment variables, not for other container configuration. If you take a look at the example job definition, can you verify that you're setting vcpus in the containerProperties and not in the environment? docs.aws.amazon.com/batch/latest/userguide/…Ngenator

1 Answers

2
votes

You shouldn't have to worry about separating the jobs onto different instances because the containers the jobs run in are limited in how many vCPUs they can use. For example, if you launch two jobs that each require 4 vCPUs, Batch might spin up an instance that has 8 vCPUs and run both jobs on the same instance. Each job will have access to only 4 of the vCPUs, so performance should be identical to a job running on its own with no other jobs on the instance.

However, if you still want to separate the jobs onto separate instances, you can do so by matching the vCPUs of the job with the instance type in the compute environment. For example, if you have a job that requires 4 vCPUs, you can configure your compute environment to only allow c5.xlarge instances, so each instance can run only one job. However, if you want to run other jobs with higher vCPU requirements, you would have to run them in a different compute environment.