Multiple pods of a 600 pod deployment stuck in ContainerCreating
after a rolling update with the message:
Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod network: add cmd: failed to assign an IP address to container
What I have tried:
- Upgraded to v1.12 on EKS and CNI 1.5.0. This issue was closed stating CNI 1.5.0 solved the issue. It did not for us. In another thread leaking ENIs was blamed but was also closed due to CNI upgrade.
- Installed cni-metrics-helper and this is a snapshot of the output:
maxIPAddresses, value: 759.000000
ipamdActionInProgress, value: 1.000000
addReqCount, value: 16093.000000
awsAPILatency, value: 564.000000
delReqCount, value: 32337.000000
eniMaxAvailable, value: 69.000000
assignIPAddresses, value: 558.000000
totalIPAddresses, value: 682.000000
eniAllocated, value: 69.000000
Do the CNI metrics output suggest there's an issue? Seems like there are enough IPs.
What else can I try to debug?