4
votes

Official documentation on enabling the GPU support states:

A special alpha feature gate Accelerators has to be set to true across the system: --feature-gates="Accelerators=true".

I am having trouble decoding the "set to true across the system" part.

I have discovered that kubelet, kube-apiserver, and kube-controller-manager all have the --feature-gates runtime parameter. The specification states that they all listen on modifications to config file.

Any help with where those config files are how I can enable the --feature-gates="Accelerators=true" option in them?

I did try adding the option to /etc/kubernetes/manifests/kube-apiserver.yaml: spec:

  containers:
  - command:
    - kube-apiserver
    - -- <...>
    - --feature-gates=Accelerators=true

However, that causes kube-apiserver to stop and never come back.

In the end I found the following workaround here:

3.I Add GPU support to the Kubeadm configuration, while cluster is not initialized. This has to be done for every node across your cluster, even if some of them don't have any GPUs.

sudo vim /etc/systemd/system/kubelet.service.d/<>-kubeadm.conf Therefore, append ExecStart with the flag --feature-gates="Accelerators=true", so it will look like this:

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS [...] --feature-gates="Accelerators=true" 3.II Restart kubelet

sudo systemctl daemon-reload sudo systemctl restart kubelet

However, I believe that the above approach is not how Kubernetes developers intended for this feature to be enabled. Any help would be appreciated.


[Edit] I was able to turn on the option on both api-server and controller-manager - neither gave the desired result of gpu becoming visible.

So it's the kubelet service that needs to get this option.

The question becomes: how can the option be set via the kubelet config file?

3
What do kube-apiserver logs say when it fails to start after you add the flag?kichik
I don't have nice logs under /var/log/kube*, the only output I see is in journalctl. I can infer that the following lines are relevant there: kubelet.go:1596] Deleting mirror pod "kube-apiserver-XX_kube-system(f30d81e3-6b4d-11e7-8d98-4ccc6af724b9)" because it is outdated kubelet.go:1607] Failed creating a mirror pod for "kube-apiserver-eg101_kube-system(635d00135d0920d6083b2b5a38a22810)": Post XX:6443/api/v1/namespaces/kube-system/pods: dial tcp XX:6443: getsockopt: connection refused I get the same error when trying to set the feature-gates parameter for control-manager as well.Lana Nova
which makes no sense since what the logs above are saying are: can't reach api-server to start up a mirror api-server...Lana Nova
Can you add those to the question? It's impossible to read in the comments.kichik
I found similar problem of apiserver not starting up after the update to the config file here: github.com/kubernetes/contrib/issues/2249 . The fix was to restart kubelet. Setting the feature-gates setting for either of api-server or control-manager does not have the desired effect of gpu node starting up. Which means that the setting needs to be turned on for kubelet specifically. I have a way of doing that by modifying the xx-kubeadm.conf file. Is there a way to do it by modifying the kubelet config file? (I'll modify the question as well)Lana Nova

3 Answers

4
votes

I use Ubuntu16.04.

Add --feature-gates="Accelerators=true" to KUBELET_ARGS in file /etc/kubernetes/kubelet should be fine.

1
votes

If you use kops to run your k8s then you can use this instruction: https://github.com/kubernetes/kops/blob/master/docs/gpu.md

Basically this comes down to editing your cluster kops edit cluster gpu.example.com

And adding specific configuration enabling gpu processing to kubelet spec: ... kubelet: featureGates: Accelerators: "true"

Then you need to update your cluster and do rolling-update so all the nodes will use new kubelet configuration.

With cluster rolled you can check if the feature-gate flag is enabled on kublet and deploy pods using GPU.

0
votes

Logically you would need GPU support enabled on the nodes.That would mean the correct place is kubelet or node config.

Kube apiserver would not be the right place for this.

Once enabled "The nodes will automatically discover and expose all Nvidia GPUs as a schedulable resource."