Mesos: Failed to get/update resource statistics for executor

Question

we are having issues with full logs from mesos-agents with messages like:

2018-06-19T07:31:05.247394+00:00 mesos-slave16 mesos-slave[10243]: W0619 07:31:05.244067 10249 slave.cpp:6750] Failed to get resource statistics for executor 'research_new-benchmarks_production_testbox-58-1529393461975-1-mesos_slave16' of framework Singularity-PROD: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-7560fb72-28d3-4cce-8cb0-de889248cf93': exited with status 1; stderr='Error: No such object: mesos-7560fb72-28d3-4cce-8cb0-de889248cf93

or

2018-06-19T07:31:09.904414+00:00 mesos-slave16 mesos-slave[10243]: E0619 07:31:09.903687 10251 slave.cpp:4721] Failed to update resources for container b9a9f7f9-938b-4ec4-a245-331122471769 of executor 'hera_listening-api_production_checkAlert-93-1529393402085-1-mesos_slave16-us_west_2a' running task hera_listening-api_production_checkAlert-93-1529393402085-1-mesos_slave16 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/14447/cgroup: Failed to open file: No such file or directory

We are running 3x ha mesos-master, marathon framework, singularity framework - happening with tasks from both frameworks. Tasks running, crons (from singularity) running ok too, but i am confused of thouse messages. We have more than 600 long running marathon tasks and more than 30 crons starting per few minutes.

Docker version: 18.03.0-ce Mesos version: 1.4.0-2.0.1 Marathon version: 1.4.2-1.0.647.ubuntu1604 Singularity version: 0.15.1

Masters and slaves running on Ubuntu 16.04 with AWS kernel - 4.4.0-1060-aws

I think that mesos executor on slave is deleted after task is finished, but mesos still trying to get info from docker, where task is no loger visible.

Any ideas? Thanks

Light.G Light.G · Accepted Answer · 2018-09-03T14:49:09

Marathon is a scheduler framework for permanent tasks. Although tasks exit successfully, it would still insist to re-schedule tasks all the time.

We could see health check is one of its important features. Maybe try chronos. It’s another framework working on Apache mesos.

Mesos: Failed to get/update resource statistics for executor

1 Answers