A play with multiple hosts has block/always. How to let the playbook exit once the play fails on either host?

Question

I have two plays (1 and 2) in my playbook. First play play1 has two tasks (A/B). If task A fails, I need task B also be executed and then playbook exits. In otherwords, play 2 will be skipped. So I used block/always method. It works fine when the host is single host. But when I specify multiple hosts to plays, play2 still got executed. Although play2 was only executed against one host, I expect the playbook to exit before play2.

I tried to add any_errors_fatal to task A, however it doesn't work.

# single host playbook

name: Test Block 1

hosts: pltB

gather_facts: no

tasks:
- block:
  - command: "/usr/bin/hostname1"
  register: hostname_res
  
  any_errors_fatal: true
  
  always:
  - debug: msg="from always block 1"
name: Test Block 2

hosts: pltB

gather_facts: no

tasks:
- block:
  - debug: msg="result is {{ hostname_res.stdout }} "
  always:
  - debug: msg="from always block 2" ...

output of single host

ansible-playbook test.yml -i ../inventory/serverhosts

PLAY [Test Block 1] **************************************************************************************

TASK [command] *************************************************************************************** fatal: [192.168.111.25]: FAILED! => {"changed": false, "cmd": "/usr/bin/hostname1", "msg": "[Errno 2] No such file or directory", "rc": 2}

TASK [debug] ************************************************************************************* ok: [192.168.111.25] => { "msg": "from always block 1" } to retry, use: --limit @/home/playbooks/test.retry

PLAY RECAP ************************************************************************************* 192.168.111.25 : ok=1 changed=0 unreachable=0 failed=1

multiple servers in hosts

name: Test Block 1

hosts: pltB,pltA

gather_facts: no

tasks:
- block:
  - command: "/usr/bin/hostname1"
  register: hostname_res
  
  any_errors_fatal: true
  
  always:
  - debug: msg="from always block 1"
name: Test Block 2

hosts: pltB,pltA

gather_facts: no

tasks:
- block:
  - debug: msg="result is {{ hostname_res.stdout }} "
  always:
  - debug: msg="from always block 2" ...

output of multiple servers

PLAY [Test Block 1] ***********************************************************************************

TASK [command] *************************************************************************************** fatal: [192.168.111.25]: FAILED! => {"changed": false, "cmd": "/usr/bin/hostname1", "msg": "[Errno 2] No such file or directory", "rc": 2} changed: [192.168.111.24]

TASK [debug] *************************************************************************************** ok: [192.168.111.25] => { "msg": "from always block 1" } ok: [192.168.111.24] => { "msg": "from always block 1" }

PLAY [Test Block 2] *********************************************************************************

TASK [debug] *************************************************************************************** ok: [192.168.111.24] => { "msg": "result is plt001 " }

TASK [debug] ************************************************************************************** ok: [192.168.111.24] => { "msg": "from always block 2" } to retry, use: --limit @/home/playbooks/test.retry

PLAY RECAP ******************************************************************************************************* 192.168.111.24 : ok=4 changed=1 unreachable=0 failed=0 192.168.111.25 : ok=1 changed=0 unreachable=0 failed=1

Andrew Andrew · Accepted Answer · 2019-09-12T00:05:14

I think this behaviour is a bug in Ansible, however I'll start with a workaround to get the behaviour I think you're after.

Workaround

Register a fact with the number of hosts in ansible_play_batch prior to the block, and have a task after the block/always with a task to fail when the number of active hosts reduces.

From the Ansible Special Variables documentation

ansible_play_batch List of active hosts in the current play run limited by the serial, aka ‘batch’. Failed/Unreachable hosts are not considered ‘active’.

https://docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

---
- hosts: all
  gather_facts: no
  tasks:
    - name: set fact with play batch size
      local_action:
        module: set_fact
        play_batch_size: "{{ ansible_play_batch|length|int }}"
      run_once: yes
    - block:
      - name: failing task on some hosts
        command: "cat test.txt"
        any_errors_fatal: true
      always:
      - name: message from always section
        debug:
          msg: "Debug task success"

    - name: fail if the batch size has decreased
      local_action:
        module: fail
        msg: "Halting playbook execution due to prior error, when {{ ansible_play_batch|length|int }} < {{ play_batch_size }}"
      when: ansible_play_batch|length|int < play_batch_size
      run_once: true

    - name: this should never run
      debug:
        msg: "Epic fail"

This playbook works as expected with the exception of correctly identifying the failed hosts in the console PLAY RECAP.

Problem explanation

The behaviour you're seeing and I'm reproducing with Ansible v2.8.1 is that the always section completing successfully is erasing the error status and not triggering the any_errors_fatal condition.

Erasing the error status is a documented feature of using a rescue block. As per the current latest Ansible documentation on Blocks,

The always section runs no matter what previous error did or did not occur in the block and rescue sections. It should be noted that the play continues if a rescue section completes successfully as it ‘erases’ the error status (but not the reporting), this means it won’t trigger max_fail_percentage nor any_errors_fatal configurations but will appear in the playbook statistics.

https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html#id4

However, as per the observed behaviour by us both, the documentation should either read "...the play continues if a rescue or always section completes successfully" or more likely the behaviour of Ansible is incorrect, or at least inconsistent with block/rescue/always not behaving like java's try/catch/finally. With the always section equating to the java's finally I wouldn't expect anything in the always section to prevent the failure from halting the playbook when any_errors_fatal is enabled. Enabling any_errors_fatal at the playbook and Ansible defaults level made no difference to the observed behaviour.