2
votes

I'm on a Mac machine.

$ which ansible
/Library/Frameworks/Python.framework/Versions/3.5/bin/ansible

or I guess, ansible can be located at a generic location: /usr/bin/ansible (for ex: on CentOS/Ubuntu).

$ ansible --version
ansible 2.2.0.0

Running the following playbook works fine from my other vagrant / Ubuntu box.

Playbook file looks like:

- hosts: all
  become: true
  gather_facts: true

  roles:
    - a_role_which_just_say_hello_world_debug_msg

From my local machine, I can successfully ssh to the target servers/the following server (without any password as I have already added the .pem key file using ssh-add), which is failing in Ansible playbook's [Setup] (gather facts step) in Ansible playbook run.

On Mac machine, I'm getting this error sometimes (not everytime). Error: Failed to connect to the host via ssh: Connection timed out during banner exchange. PS: this issue is not coming all the time.

$ ansible-playbook -i inventory -l tag_cluster_mycluster myplabook.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [myclusterSomeServer01_i_07f318688f6339971]
fatal: [myclusterSomeServer02_i_03df6f1f988e665d9]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Connection timed out during banner exchange\r\n", "unreachable": true}

OK, tried couple of times, same behavior, out of 15 servers (that I have in the mycluster cluster), the [SETUP] setup is failing during the gathering facts setup and next time it's working fine.

Retried: $ ansible-playbook -i inventory -l tag_cluster_mycluster myplabook.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [myclusterSomeServer01_i_07f318688f6339971]
ok: [myclusterSomeServer02_i_03df6f1f988e665d9]
ok: [myclusterSomeServer03_i_057dfr56u88e665d9]
...
.....more...this time it worked for all servers.

As you see above, this time the above step worked fine. The same issue (SSH connection timed out) is happening during some task/actions (where I'm trying to install something using Ansible yum module. If I try it again, it works fine for the server which failed last time but then it may fail for another server which was successful last time. Thus, the behavior is random.

My /etc/ansible/ansible.cfg file has:

[ssh_connection]
scp_if_ssh = True
1

1 Answers

3
votes

Adding the following timeout setting to /etc/ansible/ansible.cfg config file worked when I increased it to 25. When it was 10 or 15, I still saw the errors in some servers due to connection timeout banner issue.

[defaults]
timeout = 25

[ssh_connection]
scp_if_ssh = True

Apart from the above, I had to use serial: N or serial: N% (where N is a number) to run my playbook on N number or percentage of servers at a time, then it worked fine.

i.e.

- hosts: all
  become: true
  gather_facts: true
  serial: 2
  #serial: "10%"
  #serial: "{{ serialNumber }}"
  #serial: "{{ serialNumber }}%"

  vars:
   - serialNumber: 5

  roles:
    - a_role_which_just_say_hello_world_debug_msg