2
votes

Good afternoon.

I have encountered one problem when I tried to add GPU(NVIDIA Tesla T4) to an instance of GCP AI platform Notebooks.

The thing I want to do is to start an instance with GPU. But it doesn't work and GCP says

There are no GPUs available for the zone, framework and machine type of this instance.

And when I started an instance, it says

riiid: The zone 'projects/adept-rock-292801/zones/asia-northeast1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

Problem Occured


CHECKPOINT 1

I've checked Admin Quota and here's the setting. I guess there's no problem with that.

GPUs (all regions): limit 1
NVIDIA T4 GPUs - asia-northeast1: limit 1

CHECKPOINT 2

There is only one instance that I have created so far. So no other instances use quota limit.


CHECKPOINT 3

This following link says NVIDIA Tesla T4 is available in Zone asia-northeast1-a, so I guess availability zone is not the factor of this problem.

https://cloud.google.com/compute/docs/gpus/gpu-regions-zones


CHECKPOINT 4

My instance's machine type is now 4 vCPUs, 15 GB RAM(* n1-standard-4) so there should be no problem in regards to the machine type, according to the following link.

https://cloud.google.com/compute/docs/gpus


Why is GPU not available with this situation? Does anyone give me a tip to solve this?

Thank you.

1
Machine type? E2 ? N1? Check this link and verify your machine type supports GPU. cloud.google.com/compute/docs/machine-types, have you tried with other GPU type? For example P100 ?gogasca
@gogasca Thank you for your message. I am using n1-standard-4 and there is no option to select GPU type but None.Hodaka Kubo

1 Answers

0
votes

This issue has been addressed by a Public Issue Tracker case, here. Since you are still experiencing it, you can leave a comment there and describe how you are experiencing the issue. Thus, the case will be re-opened.

However, there is a workaround. In order to be able to add a GPU to an AI platform Notebook instance after it is creation, follow the steps below:

  1. Create an instance selecting Python 3 (CUDA Toolkit 11.0) and option without GPU;
  2. Go to Compute engine and select your VM;
  3. Stop the VM and click edit;
  4. Under Machine configuration, go to GPU type and add the desired type of GPU;
  5. Save the changes and start your VM;
  6. SSH into it and you will be prompted to install the nvidia drivers;
  7. If you are not prompt, use the following command to install the drivers: sudo /opt/deeplearning/install-driver.sh;
  8. Confirm the installation of the drivers with nvidia-smi;

Please pay attention to the notes below:

  • If you have a firewall rule, the port 22 should be whitelisted. You can use the command gcloud compute firewall-rules create default-allow-ssh --allow tcp:22 to do so.
  • It is highly advisable to spread your work load across multiple regions, here;
  • Currently, you seem to be using the GPUs instances on demand without the guarantee of capacity since the zone can be depleted. For this reason, if you want to guarantee your resources you can use a feature called Reservations, which ensures that the resources are available for your workloads when you need them;