Is there anyway to stick to quota 20.0 CPUs and submit the job?

Question

Whenever I tried to submit training job to gcloud using command

gcloud ml-engine jobs submit training

it gives quota error that is

The requested 60.0 CPUs exceeds the allowed maximum of 20.0.

Even I never define 60.0 CPUs in command. According to google docs, we need to increase quota to make this work. Is there any way to stick to quota 20.0 CPUs and train model on GCP?

What's your region? what are all the params of your submission? — guillaume blaquiere

Nikita Durasov Nikita Durasov · Accepted Answer · 2020-09-08T22:50:22

I'm not sure whether that's a solution for you issue or not, but here what I've done when I got:

The requested N CPUs exceeds the allowed maximum of 20.0.

from gcloud ai-platform jobs submit training. According to this and this links you could pass --scale-tier argument to submit training command, which controls some specs of your job including number for workers. In this case, if you set --scale-tier to STANDARD, PREMIUM or CUSTOM, then CPU workers will scale to new number accordingly (e.g. in your case it's 60.0 CPUs).

Since BASIC tier is "single worker instance", then simply switching to

gcloud ai-platform jobs submit training --scale-tier BASIC-[GPU|TPU]

should solve this quota issue. Point on increasing you quota is valid, but as far as I get it larger number of workers in your case is not desired.

Otherwise, if you want to speed up you training, then you should look at CUSTOM tier and workerCount argument for it, which specifies the number of workers to use (more information on that is here).

Is there anyway to stick to quota 20.0 CPUs and submit the job?

2 Answers