2
votes

I am setting up a Mesos/Marathon cluster on EC2 amazon with one master node and two agents. The installation is successful and when looking at :mesos-port the agents are listed correctly.

The Host is registered by the private DNS (ip-17*---.ec2.internal).

When I try to launch a docker image (tutum/hello-world) through the Marathon webui the deployment fails.

In the Mesos UI the completed tasks list will show the failed deployments attempts. Under the Sandbox link it states:

Failed to connect to agent '12136c28-93e7-4642-a5b6-c5e9a55eedd1-S0' on 'ip-17*-**-*-***.ec2.internal:5051'.
Potential reasons:
The agent's hostname, 'ip-17*-**-*-***.ec2.internal', is not accessible from your network

The agent's port, '5051', is not accessible from your network The agent timed out or went offline

I opened the port range completely in the safetygroup and I can ping from the master to the agents.

I added the private ip into the /etc/hosts file to be safe but that also does nothing.

Any ideas?

1
Why are you using the internal IP and what code are using to access that internal IP?error2007s
When I configure zookeeper at /etc/mesos/zk I use the internal IP so the agents can find the master node. The file contains zk://172.**.*.***:2181/mesos. The internal IP seems to work fine as I do see the agents registering with the master node in the webui of mesos. The agents use the private DNS to register at the master node. The master node however then fails to launch docker images on any of the agents, even if the agent process is running on the same instance as the master instance.Alexander Lachmann
The internal IPs work within the network but you cannot access them from outside the networkerror2007s
The master and agent processes are all running within the same network.Alexander Lachmann

1 Answers

2
votes

I have done this a long time ago so i donot remember the paths exactly.

In Slave Go to /etc/mesos-slave folder (create if missing) and create two files as follows:

1) Set containerizers file with (“mesos,docker”) in it.

2) Set Execution_time_out file with (“5mins”) in it.

Refer: https://mesosphere.github.io/marathon/docs/native-docker.html https://mesosphere.github.io/marathon/docs/troubleshooting.html

Now restart your master and slaves.

Also, you need to open up all the ports in your security groups. You can open All Traffic for testing (Not Recommended)

Done!