We are using Prometheus for metrics collection. Prometheus will be deployed as a container and it will collect metrics from various sources and stores the data in the local machine (where the container is running). If the node which holds the container failed we are losing metrics along with that node as Prometheus stored all metrics in that local machine. Kubernetes will detect container failure and span that container in a healthy node but we have lost data in the old node.
To solve this issue we have come with two ideas either
we have to decouple the whole Prometheus from Kubernetes.
- We need to make sure of high availability for the Prometheus server and data of the Prometheus server. Also, we need to make sure authentication for Prometheus. There is some security concern here as Prometheus is not shipped with auth by default prometheus-basic-auth, we have to use a reverse proxy to handle authentication. Prometheus needs to talk with Kubernetes internal component so we need to make a secure way for that too.
we have to decouple the storage alone eg: NFS like protocol (PV in Kubernetes term).
- We need to make sure of high availability for data of Prometheus. Need to secure NFS.
Which one should we use?
If any other industry solution exists share that too. If any of the above has unmentioned side effects kindly let me know.