12
votes

The resource limit of Pod has been set as:

resource
  limit
    cpu: 500m
    memory: 5Gi

and there's 10G mem left on the node.

I've created 5 pods in a short time successfully, and the node maybe still have some mem left, e.g. 8G.

The mem usage is growing as the time goes on, and reach the limit (5G x 5 = 25G > 10G), then the node will be out of response.

In order to ensure the usability, is there a way to set the resource limit on the node?

Update

The core problem is that pod memory usage does not always equal to the limit, especially in the time when it just starts. So there can be unlimited pods created as soon as possible, then make all nodes full load. That's not good. There might be something to allocate resources rather than setting the limit.

Update 2

I've tested again for the limits and resources:

resources:
  limits:
    cpu: 500m
    memory: 5Gi
  requests:
    cpu: 500m
    memory: 5Gi

The total mem is 15G and left 14G, but 3 pods are scheduled and running successfully:

> free -mh
              total        used        free      shared  buff/cache   available
Mem:            15G        1.1G        8.3G        3.4M        6.2G         14G
Swap:            0B          0B          0B

> docker stats

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O
44eaa3e2d68c        0.63%               1.939 GB / 5.369 GB   36.11%              0 B / 0 B           47.84 MB / 0 B
87099000037c        0.58%               2.187 GB / 5.369 GB   40.74%              0 B / 0 B           48.01 MB / 0 B
d5954ab37642        0.58%               1.936 GB / 5.369 GB   36.07%              0 B / 0 B           47.81 MB / 0 B

It seems that the node will be crushed soon XD

Update 3

Now I change the resources limits, request 5G and limit 8G:

resources:
  limits:
    cpu: 500m
    memory: 5Gi
  requests:
    cpu: 500m
    memory: 8Gi

The results are: enter image description here

According to the k8s source code about the resource check:

enter image description here

The total memory is only 15G, and all the pods needs 24G, so all the pods may be killed. (my single one container will cost more than 16G usually if not limited.)

It means that you'd better keep the requests exactly equals to the limits in order to avoid pod killed or node crush. As if the requests value is not specified, it will be set to the limit as default, so what exactly requests used for? I think only limits is totally enough, or IMO, on the contrary of what K8s claimed, I rather like to set the resource request greater than the limit, in order to ensure the usability of nodes.

Update 4

Kubernetes 1.1 schedule the pods mem requests via the formula:

(capacity - memoryRequested) >= podRequest.memory

It seems that kubernetes is not caring about memory usage as Vishnu Kannan said. So the node will be crushed if the mem used much by other apps.

Fortunately, from the commit e64fe822, the formula has been changed as:

(allocatable - memoryRequested) >= podRequest.memory

waiting for the k8s v1.2!

2

2 Answers

19
votes

Kubernetes resource specifications have two fields, request and limit.

limits place a cap on how much of a resource a container can use. For memory, if a container goes above its limits, it will be OOM killed. For CPU, its usage may be throttled.

requests are different in that they ensure the node that the pod is put on has at least that much capacity available for it. If you want to make sure that your pods will be able to grow to a particular size without the node running out of resources, specify a request of that size. This will limit how many pods you can schedule, though -- a 10G node will only be able to fit 2 pods with a 5G memory request.

8
votes

Kubernetes supports Quality of Service. If your Pods have limits set, they belong to the Guaranteed class and the likelihood of them getting killed due to system memory pressure is extremely low. If the docker daemon or some other daemon you run on the node consumes a lot of memory, that's when there is a possibility for Guaranteed Pods to get killed.

The Kube scheduler does take into account memory capacity and memory allocated while scheduling. For instance, you cannot schedule more than two pods each requesting 5 GB on a 10GB node.

Memory usage is not consumed by Kubernetes as of now for the purposes of scheduling.