4
votes

I have a bunch of servers that need will be need frequent patching. I am planning on using Ansible to coordinate the patching process. The keep point here is that it must be an "all or nothing" patching. Either all servers are patched or none.

The tasks I was considering for my playbook would be something like: 1 - Go to all servers and take an lvm snapshot 2 - IIF task 1 works on all servers, apply the changes 3 - If one of the hosts fails for any reason, roll back the snapshot on ALL NODES.

The problem is that I am new to Ansible and I can't express this on a playbook. I have written this simple testing playbook:

---
- hosts: all
  strategy: linear

  tasks:
  - block:
      - debug: msg='Testing on {{ inventory_hostname }}...'
      - command: /home/amirsamary/activity.sh
        changed_when: false
    rescue:
      - debug: msg='Rollback of {{ inventory_hostname }}...'
  - debug: msg='I continued running tasks on {{ inventory_hostname }}...'

I have two hosts on my inventory. On the first node, activity.sh returns true and on the second node, activity.sh returns false. So, node2 will always fail. The problem is that the rescue tasks will only run for the failed host and not for all of them (as one would expect anyway) and the playbook keeps running the other tasks.

I have heard a lot about how good Ansible was to orchestrate complex tasks on thousands of servers. But I can't seem to find a way of safely implement an "all or nothing strategy" with it. What am I missing?

1

1 Answers

4
votes

I bet there are many ways to implement this, here is one of them:

---
- hosts: all
  strategy: linear

  tasks:
    - debug: msg='Testing on {{ inventory_hostname }}...'
    - command: /home/amirsamary/activity.sh
      register: cmd_result
      ignore_errors: true
    - debug: msg='Rollback of {{ inventory_hostname }}...'
      when: play_hosts | map('extract', hostvars, 'cmd_result') | selectattr('failed','defined') | list | count > 0

What's done here?

  • we register result of script execution into cmd_result and ignore errors, if any
  • with linear strategy, we will have command task completed on all hosts before next task being executed
  • so we have cmd_result registered for every host
  • to check if we need to rollback we extract cmd_result facts for all hosts in the current play, select those with failed defined, convert them to list and count them: if there is any, rollback.

So rollback task will be executed for all hosts if there is failed cmd_result for any of them.

You may want to add this task after rollback task:

- fail: msg='Patch command failed!'
  when: cmd_result | failed

This way you will have your rollback tasks done and also mark problem hosts as failed.