25
votes

I'd like to keep the number of cores in my GKE cluster below 3. This becomes much more feasible if the CPU limits of the K8s replication controllers and pods are reduced from 100m to at most 50m. Otherwise, the K8s pods alone take 70% of one core.

I decided against increasing the CPU power of a node. This would be conceptually wrong in my opinion because the CPU limit is defined to be measured in cores. Instead, I did the following:

  • replacing limitranges/limits with a version with "50m" as default CPU limit (not necessary, but in my opinion cleaner)
  • patching all replication controller in the kube-system namespace to use 50m for all containers
  • deleting their pods
  • replacing all non-rc pods in the kube-system namespace with versions that use 50m for all containers

This is a lot of work and probably fragile. Any further changes in upcoming versions of K8s, or changes in the GKE configuration, may break it.

So, is there a better way?

4

4 Answers

11
votes

Changing the default Namespace's LimitRange spec.limits.defaultRequest.cpu should be a legitimate solution for changing the default for new Pods. Note that LimitRange objects are namespaced, so if you use extra Namespaces you probably want to think about what a sane default is for them.

As you point out, this will not affect existing objects or objects in the kube-system Namespace.

The objects in the kube-system Namespace were mostly sized empirically - based on observed values. Changing those might have detrimental effects, but maybe not if your cluster is very small.

We have an open issue (https://github.com/kubernetes/kubernetes/issues/13048) to adjust the kube-system requests based on total cluster size, but that is not is not implemented yet. We have another open issue (https://github.com/kubernetes/kubernetes/issues/13695) to perhaps use a lower QoS for some kube-system resources, but again - not implemented yet.

Of these, I think that #13048 is the right way to implement what you 're asking for. For now, the answer to "is there a better way" is sadly "no". We chose defaults for medium sized clusters - for very small clusters you probably need to do what you are doing.

8
votes

I have found one of the best ways to reduce the system resource requests on a GKE cluster, is to use a vertical autoscaler.

Here are the VPA definitions I have used:

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: kube-dns-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: kube-dns
  updatePolicy:
    updateMode: "Auto"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: heapster-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: heapster-v1.6.0-beta.1
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: metadata-agent-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: DaemonSet
    name: metadata-agent
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: metrics-server-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: metrics-server-v0.3.1
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: fluentd-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: DaemonSet
    name: fluentd-gcp-v3.1.1
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: kube-proxy-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: DaemonSet
    name: kube-proxy
  updatePolicy:
    updateMode: "Initial"

Here is a screenshot of what it does to a kube-dns deployment.

1
votes

By the way just in case you wanted to try this on Google Cloud GCE. If you try to change the CPU limit of the core services like kube-dns you will get an error like this.

spec: Forbidden: pod updates may not change fields other than spec.containers[*].image, spec.initContainers[*].image, spec.activeDeadlineSeconds or spec.tolerations (only additions to existing tolerations

Tried on Kubernetes 1.8.7 and 1.9.4.

So at this time the minimum node you need to deploy is n1-standard-1. Also with that about 8% of your cpu is eaten almost constantly by the Kubernetes itself as soon as you have several pods and helms. even if you are not running any major load. I think there are a lot of polling going on and to make sure the cluster is responsive they keep refreshing some stats.

0
votes

I did try to do this change directly in the google console but my changes were not saved even for my own application pods.

I am getting this error. Forbidden: pod updates may not change fields other than spec.containers[*].image, `spec.i..

I did find this excellent article on this subject while researching though, https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b