We've suddenly started receiving the error:
Salt request timed out. The master is not responding. If this error persists after verifying the master is up, worker_threads may need to be increased."
While this seems straight forward enough to fix, bumping the worker-thread count from 5 to 100 did not solve the problem. A command like salt "*" cmd.run 'echo "hello"' was successful while the same command put in a state file and called with salt "*" state.apply hello failed. Although today even the cmd.run call is failing.
We've tried a variety of remedies that we found suggested on this sight and others for similar problems with no success, specifically: clearing the cache, killing all jobs, deleting the /var/cache/salt/minion directory and restarting the minions, rebooting the master processor, deleting /var/cache/salt/master then stopping and restarting the salt-master, making sure that there weren't any stray processes still running when it was stopped.
We aren't running with a firewall and we've verified the ports are open and communicating (as evidenced by a successful cmd.run call)
Salt version on the master:
---> salt-master --versions-report
Salt Version:
Salt: 2016.11.3
Dependency Versions:
cffi: 0.8.6
cherrypy: 3.2.2
dateutil: 1.5
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: 2.14
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Oct 11 2015, 17:47:16)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: redhat 7.2 Maipo
machine: x86_64
release: 3.10.0-327.el7.x86_64
system: Linux
version: Red Hat Enterprise Linux Server 7.2 Maipo
Salt version on the minion:
==>salt-minion --versions-report
Salt Version:
Salt: 2016.11.3
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 1.5
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Aug 29 2016, 10:12:21)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: redhat 7.2 Maipo
machine: x86_64
release: 3.10.0-327.36.1.el7.x86_64
system: Linux
version: Red Hat Enterprise Linux Server 7.2 Maipo
I've created a simple state file as an example to show you the output we're seeing. I've intentionally X'd out the IP values and host names in the output.
---> cat hello.sls
say hi:
cmd.run:
- name: echo "hello"
Results when run on the master:
---> salt -l debug "*" state.apply hello
[DEBUG ] Reading configuration from /etc/salt/master
[DEBUG ] Using cached minion ID from /etc/salt/minion_id: xx.xx.xx
[DEBUG ] Missing configuration file: /root/.saltrc
[DEBUG ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG ] Reading configuration from /etc/salt/master
[DEBUG ] Using cached minion ID from /etc/salt/minion_id: xx.xx.xx
[DEBUG ] Missing configuration file: /root/.saltrc
[DEBUG ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'xxx.xxx.xxx_master', 'tcp://xx.xx.xx.xx:4506', 'clear')
[DEBUG ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG ] LazyLoaded local_cache.get_load
[DEBUG ] Reading minion list from /var/cache/salt/master/jobs/9f/e5c817cfcbf1eedb46f97566072a7a3a102bd7505af34a3d42f889bb189e27/.minions.p
[DEBUG ] get_iter_returns for jid 20170608110512021926 sent to set(['xxx.xxx', 'xxx.xxx']) will timeout at 11:05:17.037291
[DEBUG ] Checking whether jid 20170608110512021926 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'xxx.xxx.xxx_master', 'tcp://xx.xx.xx.xx:4506', 'clear')
[DEBUG ] Passing on saltutil error. This may be an error in saltclient. 'retcode'
[DEBUG ] Checking whether jid 20170608110512021926 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'xxx.xxx.xxx_master', 'tcp://xx.xx.xx.xx:4506', 'clear')
[DEBUG ] Passing on saltutil error. This may be an error in saltclient. 'retcode'
[DEBUG ] Checking whether jid 20170608110512021926 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'xxx.xxx.xxx_master', 'tcp://xx.xx.xx.xx:4506', 'clear')
[DEBUG ] SaltReqTimeoutError, retrying. (1/3)
[DEBUG ] SaltReqTimeoutError, retrying. (2/3)
[DEBUG ] SaltReqTimeoutError, retrying. (3/3)
[DEBUG ] LazyLoaded nested.output
Salt request timed out. The master is not responding. If this error persists after verifying the master is up, worker_threads may need to be increased.
Results when run on the minion:
==>salt-call -l debug state.apply hello
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG ] Using cached minion ID from /etc/salt/minion_id: xxx
[DEBUG ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG ] Connecting to master. Attempt 1 of 1
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506')
[DEBUG ] Generated random reconnect delay between '1000ms' and '11000ms' (4260)
[DEBUG ] Setting zmq_reconnect_ivl to '4260ms'
[DEBUG ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506', 'clear')
[DEBUG ] Decrypting the current master AES key
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_64f68ad064_pub.ipc
[DEBUG ] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_64f68ad064_pull.ipc
[DEBUG ] Initializing new IPCClient for path: /var/run/salt/minion/minion_event_64f68ad064_pull.ipc
[DEBUG ] Sending event: tag = salt/auth/creds; data = {'_stamp': '2017-06-08T15:08:50.733104', 'creds': {'publish_port': 4505, 'aes': 'L/jKbh4rxM99DNrJeL6hCHTIqr4sztwZs8vqM5M9vVzOfaAe+BZrOWTFNQ87Pu7UA0T7YmGNcf0=', 'master_uri': 'tcp://xx.xx.xx.xx:4506'}, 'key': ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506')}
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] Determining pillar cache
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506', 'aes')
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506')
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] LazyLoaded jinja.render
[DEBUG ] LazyLoaded yaml.render
[DEBUG ] LazyLoaded state.apply
[DEBUG ] LazyLoaded saltutil.is_running
[DEBUG ] LazyLoaded grains.get
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506', 'aes')
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'xxx.xxx', 'tcp://xx.xx.xx.xx:4506')
[DEBUG ] SaltReqTimeoutError, retrying. (1/3)
[DEBUG ] SaltReqTimeoutError, retrying. (2/3)
[DEBUG ] SaltReqTimeoutError, retrying. (3/3)
[ERROR ] An un-handled exception was caught by salt's global exception handler:
SaltReqTimeoutError: Message timed out
Traceback (most recent call last):
File "/usr/bin/salt-call", line 11, in <module>
salt_call()
File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 379, in salt_call
client.run()
File "/usr/lib/python2.7/site-packages/salt/cli/call.py", line 58, in run
caller.run()
File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 134, in run
ret = self.call()
File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 197, in call
ret['return'] = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/salt/modules/state.py", line 543, in apply_
return sls(mods, **kwargs)
File "/usr/lib/python2.7/site-packages/salt/modules/state.py", line 1002, in sls
mocked=kwargs.get('mock', False))
File "/usr/lib/python2.7/site-packages/salt/state.py", line 3526, in __init__
BaseHighState.__init__(self, opts)
File "/usr/lib/python2.7/site-packages/salt/state.py", line 2441, in __init__
self.opts = self.__gen_opts(opts)
File "/usr/lib/python2.7/site-packages/salt/state.py", line 2468, in __gen_opts
mopts = self.client.master_opts()
File "/usr/lib/python2.7/site-packages/salt/fileclient.py", line 1338, in master_opts
return self.channel.send(load)
File "/usr/lib/python2.7/site-packages/salt/utils/async.py", line 75, in wrap
ret = self._block_future(ret)
File "/usr/lib/python2.7/site-packages/salt/utils/async.py", line 85, in _block_future
return future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 276, in send
ret = yield self._crypted_transfer(load, tries=tries, timeout=timeout, raw=raw)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 244, in _crypted_transfer
ret = yield _do_transfer()
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 228, in _do_transfer
tries=tries,
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
SaltReqTimeoutError: Message timed out
Traceback (most recent call last):
File "/usr/bin/salt-call", line 11, in <module>
salt_call()
File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 379, in salt_call
client.run()
File "/usr/lib/python2.7/site-packages/salt/cli/call.py", line 58, in run
caller.run()
File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 134, in run
ret = self.call()
File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 197, in call
ret['return'] = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/salt/modules/state.py", line 543, in apply_
return sls(mods, **kwargs)
File "/usr/lib/python2.7/site-packages/salt/modules/state.py", line 1002, in sls
mocked=kwargs.get('mock', False))
File "/usr/lib/python2.7/site-packages/salt/state.py", line 3526, in __init__
BaseHighState.__init__(self, opts)
File "/usr/lib/python2.7/site-packages/salt/state.py", line 2441, in __init__
self.opts = self.__gen_opts(opts)
File "/usr/lib/python2.7/site-packages/salt/state.py", line 2468, in __gen_opts
mopts = self.client.master_opts()
File "/usr/lib/python2.7/site-packages/salt/fileclient.py", line 1338, in master_opts
return self.channel.send(load)
File "/usr/lib/python2.7/site-packages/salt/utils/async.py", line 75, in wrap
ret = self._block_future(ret)
File "/usr/lib/python2.7/site-packages/salt/utils/async.py", line 85, in _block_future
return future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 276, in send
ret = yield self._crypted_transfer(load, tries=tries, timeout=timeout, raw=raw)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 244, in _crypted_transfer
ret = yield _do_transfer()
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 228, in _do_transfer
tries=tries,
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out