1
votes

we are recently observing this issue with the tiller timing out every 30 seconds with below error for helm init/upgrade/install commands. Although other commands such as helm init and helm list works fine. i have even tried removing the --wait option as well but that does not appear to be the issue:

I have tried rebooting the nodes, upgrading the GKE version to the latest, rebooting tiller pod and increasing the time in the timeout option, trying the command without timeout option as well.

[tiller] 2019/06/23 15:18:57 warning: Upgrade "xx" failed: 
Failed to recreate resource: Timeout: 
request did not complete within requested timeout 30s 
&& Failed to recreate resource: Timeout: 
request did not complete within requested timeout 30s

Output of helm version:

Client: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.6-gke.13", GitCommit:"fcbc1d20b6bca1936c0317743055ac75aef608ce", GitTreeState:"clean", BuildDate:"2019-06-19T20:50:07Z", GoVersion:"go1.11.5b4", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):

GKE

2
Try to upgrade helm to upper version: v2.13.1 Do you get any associated errors from control plane around this timestamp ? 'Failed to recreate resource' error suggests that you are using '--force' flag with helm upgrade command, is it right ? You may try to use it together with '--recreate-pods' if does not break your deployment strategy. - Nepomucen
its strange that the same version of the helm both client and server works fine on the other cluster we have in a different project. i am planning to upgrade to 2.14 itself.. to see if that fixes the issue.. thank you for looking into it. - satishkumar432

2 Answers

2
votes

The problem for this timeout happens to be a webhook that was configured by the below app on the cluster. The Google support team confirmed that this webhook was restricting the deployments by inspecting the api server logs. After deleting the webhook from the cluster the deployments went through.

https://github.com/reactiveops/polaris

0
votes

Helm upgrade may resolve the issue please run the below command helm init --upgrade