3
votes

Using Ansible v2.9.12

Question: I'd like Ansible to fail/stop the play when a task fails, when multiple hosts execute the task. In that sense, Ansible should abort the task from further execution. The configuration should work in a role, so using serial, or using different plays, is not possible.

Example ;

- hosts:
    - host1
    - host2
    - host3
  any_errors_fatal: true
  tasks:
    - name: always fail
      shell: /bin/false
      throttle: 1

Provides ;

===== task | always fail =======
host1: fail
host2: fail
host3: fail

Meaning, the task is still executed on the second host and third host. I'd desire the whole play to fail/stop, once a task fails on a host. When the task fails on the last host, Ansible should abort as well.

Desired outcome ;

===== task | always fail =======
host1: fail
host2: not executed/skipped, cause host1 failed
host3: not executed/skipped, cause host1 failed

As you can see, I've fiddled around with error handling, but without prevail.


Background info: I've been developing an idempotent Ansible role for mysql. It is possible to setup a cluster with multiple hosts. The role also supports adding an arbiter.

The arbiter does not has the mysql application installed, but the host is still required in the play.

Now, imagine three hosts. Host1 is the arbiter, host2 and host3 have mysql installed, setup in a cluster. The applications are setup by the Ansible role. Now, Ansible executes the role for a second/third/fourth/whatever time, and changes a config setting of mysql. Mysql needs a rolling restart. Usually, one writes some thing along the lines of:

- template:
    src: mysql.j2
    dest: /etc/mysql
  register: mysql_config
  when: mysql.role != 'arbiter'

- service:
    name: mysql
    state: restarted
  throttle: 1
  when:
    - mysql_config.changed
    - mysql.role != 'arbiter'

The downside of this Ansible configuration, is that if mysql fails to start on host2 due to whatever reason, Ansible will also restart mysql on host3. And that is undesired, because if mysql fails on host3 as well, then the cluster is lost. So, for this specific task I'd like Ansible to stop/abort/skip other tasks if mysql has failed to start on a single host in the play.

3

3 Answers

1
votes

Ok, this works:

# note that test-multi-01 set host_which_is_skipped: true
---
- hosts:
    - test-multi-01
    - test-multi-02
    - test-multi-03
  tasks:
    - set_fact:
        host_which_is_skipped: "{{ inventory_hostname }}"
      when: host_which_is_skipped

    - shell: /bin/false
      run_once: yes
      delegate_to: "{{ item }}"
      loop: "{{ ansible_play_hosts }}"
      when:
        - item != host_which_is_skipped
        - result is undefined or result is not failed
      register: result

    - meta: end_play
      when: result is failed

    - debug:
        msg: Will not happen

When the shell command is set to /bin/true, the command is executed on host2 and host3.

0
votes

One way to solve this would be to run the playbook with serial: 1. That way, the tasks are executed serially on the hosts and as soon as one task fails, the playbook terminates:

- name: My playbook
  hosts: all
  serial: 1
  any_errors_fatal: true
  tasks:
  - name: Always fail
    shell: /bin/false

In this case, this results in the task only being executed on the first host. Note that there is also the order clause, with which you can also control the order in which hosts are run: https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html#hosts-and-users

-1
votes

Disclaimer: most of the credit of the fully working part of this answer is going to @Tomasz Klosinski's answer on Server Fault.


Here is for a partially working idea, that only falls short of one host.
For the demo, I purposely increased my hosts number to 5 hosts.

The idea is based on the special variables ansible_play_batch and ansible_play_hosts_all that are described in the above mentioned document page as:

  • ansible_play_hosts_all
    List of all the hosts that were targeted by the play

  • ansible_play_batch List of active hosts in the current play run limited by the serial, aka ‘batch’. Failed/Unreachable hosts are not considered ‘active’.

The idea, coupled with your trial at using throttle: 1 should work, but fail short of one host, executing on host2 when it should skip it.

Given the playbook:

- hosts: all
  gather_facts: no
      
  tasks:
    - shell: /bin/false
      when: "ansible_play_batch | length == ansible_play_hosts_all | length"
      throttle: 1

This yields the recap:

PLAY [all] ***********************************************************************************************************

TASK [shell] *********************************************************************************************************
fatal: [host1]: FAILED! => {"changed": true, "cmd": "/bin/false", "delta": "0:00:00.003915", "end": "2020-09-06 22:09:16.550406", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:09:16.546491", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [host2]: FAILED! => {"changed": true, "cmd": "/bin/false", "delta": "0:00:00.004736", "end": "2020-09-06 22:09:16.844296", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:09:16.839560", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
skipping: [host3]
skipping: [host4]
skipping: [host5]

PLAY RECAP ***********************************************************************************************************
host1                      : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
host2                      : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
host3                      : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
host4                      : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
host5                      : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0 

Looking further on that, I landed on this answer of Server Fault, and this looks to be the right idea to craft your solution.

Instead of going the normal way, the idea is to delegate everything from the first host with a loop on all targeted hosts of the play, because, in a loop, you are then able to access the registered fact of the previous host in an easy manner, as long as your register it.

So here is the playbook:

- hosts: all
  gather_facts: no
      
  tasks:
    - shell: /bin/false
      loop: "{{ ansible_play_hosts }}"
      register: failing_task
      when: "failing_task | default({}) is not failed"
      delegate_to: "{{ item }}"
      run_once: true

This would yield the recap:

PLAY [all] ***********************************************************************************************************

TASK [shell] *********************************************************************************************************
failed: [host1 -> host1] (item=host1) => {"ansible_loop_var": "item", "changed": true, "cmd": "/bin/false", "delta": "0:00:00.003706", "end": "2020-09-06 22:18:23.822608", "item": "host1", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:18:23.818902", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
skipping: [host1] => (item=host2) 
skipping: [host1] => (item=host3) 
skipping: [host1] => (item=host4) 
skipping: [host1] => (item=host5) 

NO MORE HOSTS LEFT ***************************************************************************************************

PLAY RECAP ***********************************************************************************************************
host1                      : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

And just for the sake of proving it works as intended, altering it to make the host2 fail specifically, with the help of failed_when:

- hosts: all
  gather_facts: no
      
  tasks:
    - shell: /bin/false
      loop: "{{ ansible_play_hosts }}"
      register: failing_task
      when: "failing_task | default({}) is not failed"
      delegate_to: "{{ item }}"
      run_once: true
      failed_when: "item == 'host2'"

Yields the recap:

PLAY [all] ***********************************************************************************************************

TASK [shell] *********************************************************************************************************
changed: [host1 -> host1] => (item=host1)
failed: [host1 -> host2] (item=host2) => {"ansible_loop_var": "item", "changed": true, "cmd": "/bin/false", "delta": "0:00:00.004226", "end": "2020-09-06 22:20:38.038546", "failed_when_result": true, "item": "host2", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:20:38.034320", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
skipping: [host1] => (item=host3) 
skipping: [host1] => (item=host4) 
skipping: [host1] => (item=host5) 

NO MORE HOSTS LEFT ***************************************************************************************************

PLAY RECAP ***********************************************************************************************************
host1                      : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0