4
votes

I want to use my already existing Prometheus and Grafana instances in the monitoring namespace to emulate what seldon-core-analytics is doing. I'm using the prometheus community helm charts and installed kube-prometheus-stack on k8s. Here's what I've done so far:

In the values.yaml file, under the prometheus config, I added the following annotations:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/path: "/prometheus

Next, I looked at the prometheus-config.yaml in their Github repo and copied and pasted the configuration in a configmap file.

Also, created a ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: seldon-servicemonitor-default
  labels:
    seldon-monitor: seldon-default
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/managed-by: seldon-core
  endpoints:
    - interval: 15s
      path: /metrics
      port: http
    - interval: 15s
      path: /prometheus
      port: http
  namespaceSelector:
    matchNames:
      - seldon
      - default
      - monitoring

No errors with the above steps so far, but it doesn't appear as though the prometheus instance is able to scrape the metrics from a model I deployed on a different namespace. What other configuration do I need to do so that my own Prometheus and Grafana instances can gather and visualize the metrics from my seldon deployed models? The documentation doesn't really explain how to do this on your own instances, and the one they provide to you through seldon-core-analytics isn't production-ready.

2

2 Answers

1
votes

Prometheus configuration in seldon-core-analytics is quite standard. It is based on built-in Kubernetes service discovery and it uses annotations to find scraping targets:

annotations:
  prometheus.io/scrape: true
  prometheus.io/path: /metrics
  prometheus.io/scheme: http
  prometheus.io/port: 9100

In their example configuration prometheus will target pods, services, and endpoints with prometheus.io/scrape: true annotation on them. The other three labels are used to override default scraping parameters per target. Thus if you have a config as in the example, you only need to put some of these annotations on pods.

The way kube-prometheus-stack works is different. It uses prometheus operator and CRDs to shape the configuration. This design document describes purpose of each CRD.

You need to create a ServiceMonitor resource in order to define a scraping rule for new services. ServiceMonitor itself should have labels as defined in prometheus resource (another CRD) under serviceMonitorSelector key. It is hard to provide you with a working example in these circumstances but this short guide should be enough to understand what to do.

I suggest you describe one of the ServiceMonitors that you have, then create a new one changing labels under matchLabels. Do not change the namespace in a new object, prometheus operator does not look for ServiceMonitors in other namespaces by default. To make ServiceMonitor discover targets in all namespaces the namespaceSelector has to be empty:

spec:
  namespaceSelector:
    any: true
1
votes

ServiceMonitors are extremely difficult to debug. My debugging strategy would be to:-

  1. Check if the ServiceMonitor created is being read by the Prometheus:- Look at the /targets URL. (There should be a target in 0/0 state at least) If not, that means the ServiceMonitor itself is not being picked up by Prometheus.I suggest looking into the following configuration in your kube-prometheus-stack configuration.

        serviceMonitorSelectorNilUsesHelmValues: false
        serviceMonitorSelector: {}
        serviceMonitorNamespaceSelector: {} 
    

    The default ServiceMonitor has the Helm metadata attached to it which is used by the Prometheus Operator to filter/choose the ServiceMonitors to monitor. Setting serviceMonitorSelectorNilUsesHelmValues:false will ignore any such selection.

  2. If the ServiceMonitor is visible in targets but there are no targets.:- In this case the issue lies between the ServiceMonitor and the pods it is trying to scrape.Check if the ports you mentioned are accessible and the pods fulfill the selectors mentioned.

My advice would be to start another dummy ServiceMonitor by following this and then modifying the ServiceMonitor one step at a time till it starts monitoring the seldon-core-analytics pods