0
votes

I have an EKS kubernetes cluster i setup on HPA to be able to scaleup if there is any traffic, but there is unexpected behavior happens with every Deployment. HPA scales up to the maximum number of pods then after 5 minites it scales down again After so many searches I found that there is a cpu spike happens after the app is redeployed again and this spike takes only mili seconds that’s why it might scale. So, Do you have an Idea how to prevent this spike from happening or just I want to disable the HPA while deploying or delaying the controller manger to scaleup for example after 1 minute no the default value Thanks

1
Is the start up CPU spike so high that it is over your thrshold percentage that you have set on your HPA? I ask because surely this might indicate that your resource request might be to low?Spazzy757
Hi @Spazzy757 tanks for your support no I am using newrelic for monitoring and got the correct number from it so if he pod spike is 600m I set requests for 800m and the same happensMohamed Alaa
So to debug have you set the request limit to something crazy high like 1024m and see if it still happens? This could possible be an indication an application level issue, just a thoughtSpazzy757
what limits did you set?Markus Dresch
@Spazzy757 thank you I'll try this it and give you my feedbackMohamed Alaa

1 Answers

0
votes

This is a known issue. There is a bug with hpa implementation. You can check it out here - Issues - https://github.com/kubernetes/kubernetes/issues/78712 and https://github.com/kubernetes/kubernetes/issues/72775

The Fix is rolled out in version 1.16+ - https://github.com/kubernetes/kubernetes/pull/79035

I was facing the same issue. I have implemented a workaround for the issue. The small script which works like a charm. The steps are -

  1. Set the HPA max to current replicas
  2. Set the image
  3. Wait for deployment to be completed or specified time
  4. Set the max HPA to the original number

https://gist.github.com/shukla2112/035d4e633dded6441f5a9ba0743b4f08

it's bash script, if you are using the Jenkins for deployment, you can simply integrate it or use it independently by accommodating within your deployment workflow.

The fix would be rolled out in - 1.16+https://github.com/kubernetes/kubernetes/pull/79035version #79035 (comment)