12
votes

Multiple pods of a 600 pod deployment stuck in ContainerCreating after a rolling update with the message:

Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod network: add cmd: failed to assign an IP address to container

What I have tried:

maxIPAddresses, value: 759.000000
ipamdActionInProgress, value: 1.000000
addReqCount, value: 16093.000000
awsAPILatency, value: 564.000000
delReqCount, value: 32337.000000
eniMaxAvailable, value: 69.000000
assignIPAddresses, value: 558.000000
totalIPAddresses, value: 682.000000
eniAllocated, value: 69.000000

Do the CNI metrics output suggest there's an issue? Seems like there are enough IPs.

What else can I try to debug?

1
do you have DNS service running?Vishrant
@Vishrant Yes. Running kube-dns deployment with 3 replicas.ProGirlXOXO
What instance type are you using?Claes Mogren
@ClaesMogren t2.largeProGirlXOXO
The only thing i see is that all ENI have been allocated and nothing is left, though IPs are still available.Tarun Lalwani

1 Answers

3
votes

It seems that you reached maximum number of IP addresses in your subnet what can suggest such thing in documentation:

maxIPAddress: the maximum number of IP addresses that can be used for Pods in the cluster. (assumes there is enough IPs in the subnet).

Please take a look also on maxUnavailable and maxSurge parameters which controls how many PODs appear during rolling upgrade - maybe your configuration assumes that during rolling upgrade you will have over 600 PODs (like 130%) and that hit limits of your AWS network.