0
votes

we are having issues with full logs from mesos-agents with messages like:

2018-06-19T07:31:05.247394+00:00 mesos-slave16 mesos-slave[10243]: W0619 07:31:05.244067 10249 slave.cpp:6750] Failed to get resource statistics for executor 'research_new-benchmarks_production_testbox-58-1529393461975-1-mesos_slave16' of framework Singularity-PROD: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-7560fb72-28d3-4cce-8cb0-de889248cf93': exited with status 1; stderr='Error: No such object: mesos-7560fb72-28d3-4cce-8cb0-de889248cf93

or

2018-06-19T07:31:09.904414+00:00 mesos-slave16 mesos-slave[10243]: E0619 07:31:09.903687 10251 slave.cpp:4721] Failed to update resources for container b9a9f7f9-938b-4ec4-a245-331122471769 of executor 'hera_listening-api_production_checkAlert-93-1529393402085-1-mesos_slave16-us_west_2a' running task hera_listening-api_production_checkAlert-93-1529393402085-1-mesos_slave16 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/14447/cgroup: Failed to open file: No such file or directory

We are running 3x ha mesos-master, marathon framework, singularity framework - happening with tasks from both frameworks. Tasks running, crons (from singularity) running ok too, but i am confused of thouse messages. We have more than 600 long running marathon tasks and more than 30 crons starting per few minutes.

Docker version: 18.03.0-ce Mesos version: 1.4.0-2.0.1 Marathon version: 1.4.2-1.0.647.ubuntu1604 Singularity version: 0.15.1

Masters and slaves running on Ubuntu 16.04 with AWS kernel - 4.4.0-1060-aws

I think that mesos executor on slave is deleted after task is finished, but mesos still trying to get info from docker, where task is no loger visible.

Any ideas? Thanks

1

1 Answers

0
votes

Marathon is a scheduler framework for permanent tasks. Although tasks exit successfully, it would still insist to re-schedule tasks all the time.

We could see health check is one of its important features. Maybe try chronos. It’s another framework working on Apache mesos.