21
votes

Is there a way to monitor the pod status and restart count of pods running in a GKE cluster with Stackdriver?

While I can see CPU, memory and disk usage metrics for all pods in Stackdriver there seems to be no way of getting metrics about crashing pods or pods in a replica set being restarted due to crashes.

I'm using a Kubernetes replica set to manage the pods, hence they are respawned and created with a new name when they crash. As far as I can tell the metrics in Stackdriver appear by pod-name (which is unique for the lifetime of the pod) which doesn't sound really sensible.

Alerting upon pod failures sounds like such a natural thing that it sounds hard to believe that this is not supported at the moment. The monitoring and alerting capabilities that I get from Stackdriver for Google Container Engine as they stand seem to be rather useless as they are all bound to pods whose lifetime can be very short.

So if this doesn't work out of the box are there known workarounds or best practices on how to monitor for continuously crashing pods?

4
I am working as well on a similar solution .. At the moment I didn't find a lot regarding what you ask and other similar metrics that can be interesting .. In case I have some updates I'll let you know!Michele Orsi
Agreed that this is a glaring hole in the GKE / Stackdriver stack. Pretty amazed that I can't find a way to set up alerts on when a pod restarts or gets evicted, or when a deployment is added, etc. Will probably end up writing my own python-based daemon to do this. (using this: github.com/kubernetes-client/python )JJC

4 Answers

5
votes

You can achieve this manually with the following:

  1. In Logs Viewer, creating the following filter:

    resource.labels.project_id="<PROJECT_ID>"
    resource.labels.cluster_name="<CLUSTER_NAME>"
    resource.labels.namespace_name="<NAMESPACE, or default>"
    jsonPayload.message:"failed liveness probe"
    
  2. Create a metric by clicking on the Create Metric button above the filter input and filling up the details.

  3. You may now track this metric in Stackdriver.

Would be happy to be informed of a built-in metric instead of this.

5
votes

There is a built in metric now, so it's easy to dashboard and/or alert on it without setting up custom metrics

Metric: kubernetes.io/container/restart_count
Resource type: k8s_container
4
votes

In my cluster (a bare-metal k8s cluster),I use kube-state-metrics https://github.com/kubernetes/kube-state-metrics to do what you want. This project belongs to kubernetes repo and it is quite easy to use. Once deployed u can use kube_pod_container_status_restarts this metrics to know if a container restarts

0
votes

Remember that, you can always raise feature request if the options available are not enough.