Official documentation on enabling the GPU support states:
A special alpha feature gate Accelerators has to be set to true across the system: --feature-gates="Accelerators=true".
I am having trouble decoding the "set to true across the system" part.
I have discovered that kubelet, kube-apiserver, and kube-controller-manager all have the --feature-gates runtime parameter. The specification states that they all listen on modifications to config file.
Any help with where those config files are how I can enable the --feature-gates="Accelerators=true" option in them?
I did try adding the option to /etc/kubernetes/manifests/kube-apiserver.yaml: spec:
containers:
- command:
- kube-apiserver
- -- <...>
- --feature-gates=Accelerators=true
However, that causes kube-apiserver to stop and never come back.
In the end I found the following workaround here:
3.I Add GPU support to the Kubeadm configuration, while cluster is not initialized. This has to be done for every node across your cluster, even if some of them don't have any GPUs.
sudo vim /etc/systemd/system/kubelet.service.d/<>-kubeadm.conf Therefore, append ExecStart with the flag --feature-gates="Accelerators=true", so it will look like this:
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS [...] --feature-gates="Accelerators=true" 3.II Restart kubelet
sudo systemctl daemon-reload sudo systemctl restart kubelet
However, I believe that the above approach is not how Kubernetes developers intended for this feature to be enabled. Any help would be appreciated.
[Edit] I was able to turn on the option on both api-server and controller-manager - neither gave the desired result of gpu becoming visible.
So it's the kubelet service that needs to get this option.
The question becomes: how can the option be set via the kubelet config file?
kube-apiserver
logs say when it fails to start after you add the flag? – kichik