My current set up is as follows:
- Mesos Master —
10.20.200.300:14081- RHEL 7 - Zookeeper —
10.20.200.300:14080- RHEL 7 - Mesos Agent —
10.21.210.310:5051- Windows 2016
The master is up & is able to connect to zookeeper. However, on starting the agent, even if the agent is connecting to zookeeper, it is not getting connected to the Master.
Master was started as systemd process with below paramters under /etc/mesos-master -
hostname - mymaster.mesos.com
quorum - 1
work_dir - /var/lib/mesos
advertise_ip - 10.20.200.300
advertise_port - 14081
Below are the logs from master, slave & zookeeper.
Master Logs(Running on 10.20.200.300:14081) :
E1208 12:22:21.269227 4302 process.cpp:2455] Failed to shutdown socket with fd 26, address 10.20.200.300:14081: Transport endpoint is not connected
Zookeeper Logs(Running on 10.20.200.300:14080) :
2017-12-08 12:22:21,185 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:14080:ZooKeeperServer@942] - Client attempting to establish new session at /10.21.210.310:63039
2017-12-08 12:22:21,196 [myid:] - INFO [SyncThread:0:ZooKeeperServer@687] - Established session 0x160372c2b770010 with negotiated timeout 10000 for client /10.21.210.310:63039
Slave Logs(Running on 10.21.210.310:5051) :
I1208 12:22:21.179652 4224 slave.cpp:1007] New master detected at [email protected]:14081
I1208 12:22:21.195278 4224 slave.cpp:1031] No credentials provided. Attempting to register without authentication
I1208 12:22:21.195278 4224 slave.cpp:1042] Detecting new master
I1208 12:22:21.210924 6156 slave.cpp:5135] Got exited event for [email protected]:14081
W1208 12:22:21.210924 6156 slave.cpp:5140] Master disconnected! Waiting for a new master to be elected
I1208 12:22:21.226510 2700 slave.cpp:5135] Got exited event for [email protected]:14081
W1208 12:22:21.226510 2700 slave.cpp:5140] Master disconnected! Waiting for a new master to be elected
Does anyone know the reason for these? I have tested the connectivity between slave -> master & master -> Slave & it was successful.
Test-NetConnection -ComputerName 10.20.200.300 -Port 14081
ComputerName : 10.20.200.300
RemoteAddress : 10.20.200.300
RemotePort : 14081
InterfaceAlias : Ethernet
SourceAddress : 10.21.210.310
TcpTestSucceeded : True
[root@mesos-master]# telnet 10.21.210.310 5051
Trying 10.21.210.310...
Connected to 10.21.210.310.
Escape character is '^]'.
I got up the agents with below parameters -
C:\Mesos\mesos\build\src>C:\Mesos\mesos\build\src\mesos-agent.exe \
--master=zk://10.20.200.300:14080/mesos \
--work_dir=C:\Mesos\Logs \
--launcher_dir=C:\Mesos\mesos\build\src \
--ip=10.21.210.310 \
--advertise_ip=10.21.210.310 \
--advertise_port=5051
Master/state Logs-
{
"version": "1.3.1",
"git_sha": "1beaede8c13f0832d4921121da34f924deec8950",
"git_tag": "1.3.1",
"build_date": "2017-09-05 18:02:12",
"build_time": 1504634532,
"build_user": "centos",
"start_time": 1513010072.51033,
"elected_time": 1513010072.67995,
"id": "90f5702f-f867-41ac-8087-5d20c87ea96f",
"pid": "[email protected]:14081",
"hostname": "MYhost.COM",
"activated_slaves": 0,
"deactivated_slaves": 0,
"unreachable_slaves": 0,
"leader": "[email protected]:14081",
"leader_info": {
"id": "90f5702f-f867-41ac-8087-5d20c87ea96f",
"pid": "[email protected]:14081",
"port": 14081,
"hostname": "MYhost.COM"
},
"log_dir": "/var/log/mesos",
"flags": {
"advertise_ip": "10.20.200.300",
"advertise_port": "14081",
"agent_ping_timeout": "15secs",
"agent_reregister_timeout": "10mins",
"allocation_interval": "1secs",
"allocator": "HierarchicalDRF",
"authenticate_agents": "false",
"authenticate_frameworks": "false",
"authenticate_http_frameworks": "false",
"authenticate_http_readonly": "false",
"authenticate_http_readwrite": "false",
"authenticators": "crammd5",
"authorizers": "local",
"framework_sorter": "drf",
"help": "false",
"hostname": "MYhost.COM",
"hostname_lookup": "true",
"http_authenticators": "basic",
"initialize_driver_logging": "true",
"log_auto_initialize": "true",
"log_dir": "/var/log/mesos",
"logbufsecs": "0",
"logging_level": "INFO",
"max_agent_ping_timeouts": "5",
"max_completed_frameworks": "50",
"max_completed_tasks_per_framework": "1000",
"max_unreachable_tasks_per_framework": "1000",
"port": "14081",
"quiet": "false",
"quorum": "1",
"recovery_agent_removal_limit": "100%",
"registry": "replicated_log",
"registry_fetch_timeout": "1mins",
"registry_gc_interval": "15mins",
"registry_max_agent_age": "2weeks",
"registry_max_agent_count": "102400",
"registry_store_timeout": "20secs",
"registry_strict": "false",
"root_submissions": "true",
"user_sorter": "drf",
"version": "false",
"webui_dir": "/usr/share/mesos/webui",
"work_dir": "/var/lib/mesos",
"zk": "zk://localhost:14080/mesos",
"zk_session_timeout": "10secs"
},
"slaves": [],
"recovered_slaves": [],
"frameworks": [],
"completed_frameworks": [],
"orphan_tasks": [],
"unregistered_frameworks": []
}
Do we need to test any other connectivity or this error is for some other reason?
/master/state? - janisz