8
votes

I have multiple clusters in ECS, each cluster has multiple services, each service runs more than 1 task. Each task exposes /metrics with different values, on random port. I'd like to do some kind of dynamic discovery and scrap those metrics (each task has different port and different IP, because they run on multiple container instances), group together tasks' metrics from same service and scrap them using prometheus. How should I do that?

2

2 Answers

3
votes

We had the same challenge, and there were two approaches:

  1. Tag EC2 instance based on running tasks, then find EC2 instances in Prometheus based on tags. This worked well when we have one task per instance because the metrics port is known. There are possibly ways to extend this and support mukltiple tasks.
  2. Run a task per EC2 instance that is used as the exporter for all the tasks running on that instance. It interrogates ECS, finds that tasks and the listening ports per task and scrapes all tasks. In Prometheus, you can then find all EC2 instances in the cluster and scrape this exporter in each one. Obviously you will need to label the metrics based on the task they were read from.

if I had to do it again, I would consider using Consul to register the tasks and discover them in Prometheus. if you are already using Consul, this direction could be a good one to try.

Hope this helps.

-1
votes

If you are not willing to for a proper service discovery like Consul or AWS native service discovery (see https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/) you can leverage Prometheus file service discovery and a service that queries the AWS API, retrieves all required information and prepares the files for Prometheus. One example of such a tool can be found here: https://pypi.org/project/prometheus-ecs-discoverer/ (created by me based on another similar project).