We use helm to manage all our resources in a k8s cluster. Recently we had an incident where some k8s resources were modified outside of helm (see below for details on the root cause).
The end result is however, that we have some k8s resources in our cluster that do not match what is specified in the helm chart of the release.
Example:
We have a helm chart that contains a HorizontalPodAutoscaler
. If I do something like:
helm get myservice-release
I will see something like this:
---
# Source: myservice/charts/default-deployment/templates/default_deployment.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myservice-autoscaler
labels:
app: myservice
spec:
minReplicas: 2
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 85
However, if I do:
kubectl get hpa myservice-autoscaler -o yaml
The spec.{max,min}Replicas
does not match the Chart:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '{REDACTED}'
autoscaling.alpha.kubernetes.io/current-metrics: '{REDACTED}'
creationTimestamp: "{REDACTED}"
labels:
app: myservice
name: myservice-autoscaler
namespace: default
resourceVersion: "174526833"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/myservice-autoscaler
uid: {REDACTED}
spec:
maxReplicas: 1
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice-deployment
targetCPUUtilizationPercentage: 85
status:
currentCPUUtilizationPercentage: 9
currentReplicas: 1
desiredReplicas: 1
lastScaleTime: "{REACTED}"
I suspect, there are more than this one occurrence of drift in the k8s resources.
- How do I verify which resources have drifted?
- How do I inform helm of that drift, so the next deployment could take it into account when applying the release diff?
EDIT:
For those of you interested, this was caused by two helm
charts managing the same resources (autoscaling) both setting different values.
This occurred because two helm releases that were meant for different namespaces ended up in the same and were updated with --force
.