I have a big and fairly complex system for install into the k8s cluster.
60 microservices and 10 helm charts installed to 5 namespaces.
Currently, we run 5 helm install/upgrade commands with a pause of 30 seconds between commands. However, this strategy incurs a serious load on nodes due to the fact that we pull docker images and start applications. We have a long and not clear execution time(timeline) that often results in timeouts of components such as consul, Elasticsearch, and applications that depend on the aforementioned components.
I would like to hear opinions about ways to turn this situation around. First, here is our approach so far:
- Write the script that controls installation by helm charts.
- Write an ansible-playbook that runs Helm charts and controls the installation status of components.
- Write an ansible-playbook install components (either using Jinja2 templates or Golang templates)
- Write the k8s operator that installs components and controls the system status.