1
votes

My current set up is as follows:

  1. Mesos Master — 10.20.200.300:14081 - RHEL 7
  2. Zookeeper — 10.20.200.300:14080 - RHEL 7
  3. Mesos Agent — 10.21.210.310:5051 - Windows 2016

The master is up & is able to connect to zookeeper. However, on starting the agent, even if the agent is connecting to zookeeper, it is not getting connected to the Master.

Master was started as systemd process with below paramters under /etc/mesos-master -

hostname - mymaster.mesos.com    
quorum - 1    
work_dir - /var/lib/mesos   
advertise_ip - 10.20.200.300
advertise_port - 14081

Below are the logs from master, slave & zookeeper.

Master Logs(Running on 10.20.200.300:14081) :

E1208 12:22:21.269227  4302 process.cpp:2455] Failed to shutdown socket with fd 26, address 10.20.200.300:14081: Transport endpoint is not connected

Zookeeper Logs(Running on 10.20.200.300:14080) :

2017-12-08 12:22:21,185 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:14080:ZooKeeperServer@942] - Client attempting to establish new session at /10.21.210.310:63039     
2017-12-08 12:22:21,196 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@687] - Established session 0x160372c2b770010 with negotiated timeout 10000 for client /10.21.210.310:63039

Slave Logs(Running on 10.21.210.310:5051) :

I1208 12:22:21.179652  4224 slave.cpp:1007] New master detected at [email protected]:14081    
I1208 12:22:21.195278  4224 slave.cpp:1031] No credentials provided. Attempting to register without authentication     
I1208 12:22:21.195278  4224 slave.cpp:1042] Detecting new master     
I1208 12:22:21.210924  6156 slave.cpp:5135] Got exited event for [email protected]:14081     
W1208 12:22:21.210924  6156 slave.cpp:5140] Master disconnected! Waiting for a new master to be elected     
I1208 12:22:21.226510  2700 slave.cpp:5135] Got exited event for [email protected]:14081     
W1208 12:22:21.226510  2700 slave.cpp:5140] Master disconnected! Waiting for a new master to be elected     

Does anyone know the reason for these? I have tested the connectivity between slave -> master & master -> Slave & it was successful.

Test-NetConnection -ComputerName 10.20.200.300 -Port 14081
ComputerName     : 10.20.200.300     
RemoteAddress    : 10.20.200.300     
RemotePort       : 14081     
InterfaceAlias   : Ethernet     
SourceAddress    : 10.21.210.310     
TcpTestSucceeded : True    

[root@mesos-master]# telnet 10.21.210.310 5051
Trying 10.21.210.310...
Connected to 10.21.210.310.
Escape character is '^]'. 

I got up the agents with below parameters -

C:\Mesos\mesos\build\src>C:\Mesos\mesos\build\src\mesos-agent.exe \
         --master=zk://10.20.200.300:14080/mesos \
         --work_dir=C:\Mesos\Logs \
         --launcher_dir=C:\Mesos\mesos\build\src \
         --ip=10.21.210.310 \
         --advertise_ip=10.21.210.310 \
         --advertise_port=5051

Master/state Logs-

{
    "version": "1.3.1",
    "git_sha": "1beaede8c13f0832d4921121da34f924deec8950",
    "git_tag": "1.3.1",
    "build_date": "2017-09-05 18:02:12",
    "build_time": 1504634532,
    "build_user": "centos",
    "start_time": 1513010072.51033,
    "elected_time": 1513010072.67995,
    "id": "90f5702f-f867-41ac-8087-5d20c87ea96f",
    "pid": "[email protected]:14081",
    "hostname": "MYhost.COM",
    "activated_slaves": 0,
    "deactivated_slaves": 0,
    "unreachable_slaves": 0,
    "leader": "[email protected]:14081",
    "leader_info": {
        "id": "90f5702f-f867-41ac-8087-5d20c87ea96f",
        "pid": "[email protected]:14081",
        "port": 14081,
        "hostname": "MYhost.COM"
    },
    "log_dir": "/var/log/mesos",
    "flags": {
        "advertise_ip": "10.20.200.300",
        "advertise_port": "14081",
        "agent_ping_timeout": "15secs",
        "agent_reregister_timeout": "10mins",
        "allocation_interval": "1secs",
        "allocator": "HierarchicalDRF",
        "authenticate_agents": "false",
        "authenticate_frameworks": "false",
        "authenticate_http_frameworks": "false",
        "authenticate_http_readonly": "false",
        "authenticate_http_readwrite": "false",
        "authenticators": "crammd5",
        "authorizers": "local",
        "framework_sorter": "drf",
        "help": "false",
        "hostname": "MYhost.COM",
        "hostname_lookup": "true",
        "http_authenticators": "basic",
        "initialize_driver_logging": "true",
        "log_auto_initialize": "true",
        "log_dir": "/var/log/mesos",
        "logbufsecs": "0",
        "logging_level": "INFO",
        "max_agent_ping_timeouts": "5",
        "max_completed_frameworks": "50",
        "max_completed_tasks_per_framework": "1000",
        "max_unreachable_tasks_per_framework": "1000",
        "port": "14081",
        "quiet": "false",
        "quorum": "1",
        "recovery_agent_removal_limit": "100%",
        "registry": "replicated_log",
        "registry_fetch_timeout": "1mins",
        "registry_gc_interval": "15mins",
        "registry_max_agent_age": "2weeks",
        "registry_max_agent_count": "102400",
        "registry_store_timeout": "20secs",
        "registry_strict": "false",
        "root_submissions": "true",
        "user_sorter": "drf",
        "version": "false",
        "webui_dir": "/usr/share/mesos/webui",
        "work_dir": "/var/lib/mesos",
        "zk": "zk://localhost:14080/mesos",
        "zk_session_timeout": "10secs"
    },
    "slaves": [],
    "recovered_slaves": [],
    "frameworks": [],
    "completed_frameworks": [],
    "orphan_tasks": [],
    "unregistered_frameworks": []
}

Do we need to test any other connectivity or this error is for some other reason?

1
what about mesos master logs? - Dino L.
Do you have Master elected? Can you show output from the /master/state? - janisz
I have added the /master/state logs - swetad90
@DinoL. Mesos Master logs shows only "Failed to shutdown socket with fd 26, address 10.20.200.300:14081: Transport endpoint is not connected" - swetad90

1 Answers

0
votes

I would try this

  1. Set hostname on slave (you can say hostname=10.21.210.310)
  2. Check firewall on Windows machine. Allow incoming conections to 5051 port