1
votes

I'm trying to run shutdown scripts on GCE instances with an NVIDIA K80 GPU (guest accelerator). These instances (n1-standard-1) are running a custom image of Ubuntu 16.04 (which has the NVIDIA driver installed, as in this tutorial: https://cloud.google.com/compute/docs/gpus/add-gpus#install-driver-manual) in the us-east-1d zone.

The issue is that the shutdown script isn't being run when the instance has the NVIDIA driver installed, but consistently executes if the driver isn't installed (even when the GPU is attached). This is happening regardless of whether the instance is preemptible or not.

Running the shutdown script on the standard Ubuntu 16.04 OS image works, but as soon as the driver is installed, and the instance restarted, shutting it down doesn't trigger the script anymore. Interestingly, /var/log/syslog contains no message of the shutdown script. I would either expect an error or the message that no shutdown script was found, but none of these appear.

Any help or information about whether this is reproducible or just some mistake on my part would be greatly appreciated.

1

1 Answers

1
votes

I just tested it in my project with an NVIDIA K80 GPU and in both cases with and without I have been able to run the shutdown script. Did you actually made a test removing the GPU or are you using 2 different instances?

You can try to add the script in the custom metadata of the instances in order to check that it is not an issue with the way you connect to the bucket/permissions or of the script itself (but honestly I do no know how these can be the causes of the issues).

Therefore go to the edit page of any instances and add custom metadata and retry and let me know the result.

key = shutdown-script
value = echo hello >> marco.py

Remember that from the official documentation shutdown script output is written to the following log files:

  • CentOS and RHEL: /var/log/messages
  • Debian: /var/log/daemon.log
  • Ubuntu 14.04, 16.04, and 16.10: /var/log/syslog
  • SLES 11 and 12: /var/log/messages

UPDATE

I created the public issue that you can "star" in order to follow its updates. You need to login with any gmail account in order to check it.

https://issuetracker.google.com/issues/72981924