3
votes

I have many apps running on containers in Mesos, managed via marathon. I have given CPU allocation for each app while deploying via marathon like 1, .5 etc. But the CPU allocation in marathon, does not mean that its 1 CPU or half CPU. It simply means that its time sharing ratio. Also each container gets to access all the CPUs on its Host.

Now, I want to measure the CPU efficiency of each Container on Mesos slaves, so that I can reduce or increase the CPU allocation in for each App in Marathon. I just want to make resource utilisation even more efficient.

I could use https://github.com/bobrik/collectd-mesos-tasks, but the problem is CPU utilisation metrics does not relate to the CPU allocation in Marathon.

2
Questions about general computing hardware and software are off-topic for Stack Overflow unless they directly involve tools used primarily for programming. You may be able to get help on Super User.Marcus Müller
The question is perfectly fine for SO. It's about getting perf data off of Mesos/Marathon which is as relevant for developers as it is for admins.Michael Hausenblas
Thank you @michealBalu

2 Answers

4
votes

In Mesos WebUI you can see how much CPU is used by your executor

Here is the code that collects statistics from /monitor/statistics endpoint and calculate CPU usage.

You are interested in cpus_total_usage so the following method should works for you

Let's assume a and b are snapshot of statistics at some point in time. To calculate cpus_total_usage, we need calculate the time executor spent in the system and user space and divide it by the time elapsed between a and b.

cpus_total_usage = (
                    (b.cpus_system_time_secs - a.cpus_system_time_secs) +
                    (b.cpus_user_time_secs - a.cpus_user_time_secs)) / 
                    (b.timestamp - a.timestamp)
                   )
cpu_percent      = cpus_total_usage / cpu_limit * 100%
3
votes

Depending on how much work you want to invest yourself, you can either use the Marathon Event Bus and more generally the Marathon HTTP API (for example this endpoint) along with low-level tools like cAdvisor or cinf to do the maths yourself. If you don't want to code stuff yourself, I suggest you use Sysdig, Datadog or Prometheus to do the heavy lifting for you.