13
votes

Coming from numerous years of running node/rails apps on bare metal; i was used to be able to run as many apps as i wanted on a single machine (let's say, a 2Go at digital ocean could easily handle 10 apps without worrying, based on correct optimizations or fairly low amount of traffic)

Thing is, using kubernetes, the game sounds quite different. I've setup a "getting started" cluster with 2 standard vm (3.75Go).

Assigned a limit on a deployment with the following :

        resources:
          requests:
            cpu: "64m"
            memory: "128Mi"
          limits:
            cpu: "128m"
            memory: "256Mi"

Then witnessing the following :

Namespace       Name            CPU Requests    CPU Limits  Memory Requests Memory Limits
---------       ----            ------------    ----------  --------------- -------------
default         api             64m (6%)        128m (12%)  128Mi (3%)      256Mi (6%)

What does this 6% refers to ?

Tried to lower the CPU limit, to like, 20Mi… the app does to start (obviously, not enough resources). The docs says it is percentage of CPU. So, 20% of 3.75Go machine ? Then where this 6% comes from ?

Then increased the size of the node-pool to the n1-standard-2, the same pod effectively span 3% of node. That sounds logical, but what does it actually refers to ?

Still wonder what is the metrics to be taken in account for this part.

The app seems to need a large amount of memory on startup, but then it use only a minimal fraction of this 6%. I then feel like I misunderstanding something, or misusing it all

Thanks for any experienced tips/advices to have a better understanding Best

2
It would be helpful if you also post the table header of kubectl describe node ....svenwltr

2 Answers

11
votes

The 6% of CPU means 6% (CPU requests) of the nodes CPU time is reserved for this pod. So it guaranteed that it always get at lease this amount of CPU time. It can still burst up to 12% (CPU limits), if there is still CPU time left.

This means if the limit is very low, your application will take more time to start up. Therefore a liveness probe may kill the pod before it is ready, because the application took too long. To solve this you may have to increase the initialDelaySeconds or the timeoutSeconds of the liveness probe.


Also note that the resource requests and limits define how many resources your pod allocates, and not the actual usage.

  • The resource request is what your pod is guaranteed to get on a node. This means, that the sum of the requested resources must not be higher than the total amount of CPU/memory on that node.
  • The resource limit is the upper limit of what your pod is allowed to use. This means the sum of of these resources can be higher than the actual available CPU/memory.

Therefore the percentages tell you how much CPU and memory of the total resources your pod allocates.

Link to the docs: https://kubernetes.io/docs/user-guide/compute-resources/

Some other notable things:

  • If your pod uses more memory than defined in the limit, it gets OOMKilled (out of memory).
  • If your pod uses more memory than defined in the requests and the node runs our of memory, the pod might get OOMKilled in order to guarantee other pods to survive, which use less than their requested memory.
  • If your application needs more CPU than requested it can burst up to the limit.
  • Your pod never gets killed, because it uses too much CPU.
17
votes

According to the docs, CPU requests (and limits) are always fractions of available CPU cores on the node that the pod is scheduled on (with a resources.requests.cpu of "1" meaning reserving one CPU core exclusively for one pod). Fractions are allowed, so a CPU request of "0.5" will reserve half a CPU for one pod.

For convenience, Kubernetes allows you to specify CPU resource requests/limits in millicores:

The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu” (some may say “one hundred millicores”, and this is understood to mean the same thing when talking about Kubernetes). A request with a decimal point, like 0.1 is converted to 100m by the API, and precision finer than 1m is not allowed.

As already mentioned in the other answer, resource requests are guaranteed. This means that Kubernetes will schedule pods in a way that the sum of all requests will not exceed the amount of resources actually available on a node.

So, by requesting 64m of CPU time in your deployment, you are requesting actually 64/1000 = 0,064 = 6,4% of one of the node's CPU cores time. So that's where your 6% come from. When upgrading to a VM with more CPU cores, the amount of available CPU resources increases, so on a machine with two available CPU cores, a request for 6,4% of one CPU's time will allocate 3,2% of the CPU time of two CPUs.