0
votes

I have the following code that connects to the logger service haproxies and drains the first logger VM.

Then in a separate task connects to the logger lists of hosts, where the first host is drained and does a service reload.

- name: Haproxy Warmup
  hosts: role_max_logger_lb
  tasks:
    - name: analytics-backend 8300 range
      haproxy: 'state=disabled host=maxlog-rwva1-{{ env }}-1.example.com backend=analytics-backend socket=/var/run/admin.sock'
      become: true
      when: warmup is defined and buildnum is defined
    - name: logger-backend 8200
      haproxy: 'state=disabled host=maxlog-rwva1-prod-1.example.com :8200 backend=logger-backend socket=/var/run/admin.sock'
      become: true
      when: warmup is defined and buildnum is defined

- name: Warmup Deploy
  hosts: "role_max_logger"
  serial: 1
  tasks:
    - shell: pm2 gracefulReload max-logger
      when: warmup is defined and buildnum is defined
    - pause: prompt="First host has been deployed to. Please verify the logs before continuing. Ctrl-c to exit, Enter to continue deployment."
      when: warmup is defined and buildnum is defined

This code is pretty bad and doesn't work when I try to expand it to do a rolling restart for several services with several haproxies. I'd need to somehow drain 33% of all the app VMs from the haproxy backend and then connect to a different list and do the 33% reboot process there. Then resume at 34-66% of the draining list and then resume at 34% and 66% on the reboot list.

- name: 33% at a time drain
  hosts: "role_max_logger_lb"
  serial: "33%"
  tasks:
    - name: analytics-backend 8300 range
      haproxy: 'state=disabled host=maxlog-rwva1-prod-1.example.com 
      backend=analytics-backend socket=/var/run/admin.sock'
      become: true
      when: warmup is defined and buildnum is defined
    - name: logger-backend 8200
      haproxy: 'state=disabled host=maxlog-rwva1-prod-1.example.com:8200 backend=logger-backend socket=/var/run/admin.sock'
      become: true
      when: buildnum is defined and service is defined

- name: 33% at a time deploy
  hosts: "role_max_logger"
  serial: "33%"
  tasks:
    - shell: pm2 gracefulReload {{ service }}
      when: buildnum is defined and service is defined
    - pause: prompt="One third of machines in the pool have been deployed to. Enter to continue"

I could do this much easier in Chef, just query the chef server for all nodes registered in a given role and do all my logic in real ruby. If it matters the host lists I'm calling here are actually ripped from my Chef server and fed in as json.

I don't know what the proper Ansible way of doing this without being able to drop into arbitrary scripting to do all the dirty work.

I was thinking maybe I could do something super hacky like this inside of the of a shell command in Ansible under the deploy, which might work if there is a way of pulling the current host that is being processed out of the host list, like an Ansible equivalent of node['fqdn'] in Chef.

ssh maxlog-lb-rwva1-food-1.example.com 'echo "disable server logger-backend/maxlog-rwva1-food-1.example.com:8200" | socat stdio /run/admin.sock'

Or maybe there is a way I can wrap my entire thing in a serial 33% and include sub-plays that do things. Sort of like this, but again I don't know how to properly pass around a thirded list of my app servers within the sub-plays

- name: Deployer
 hosts: role_max_logger
 serial: "33%"
   - include: drain.yml
   - include: reboot.yml

Basically I don't know what I'm doing, I can think of a bunch of ways of trying to do this but they all seem terrible and overly obtuse. If I were to go down these hacky roads I would probably be better off just writing a big shell script or actual ruby to do this.

Reading lots of official Ansible documentation for this has overly simplified examples that don't really map to my situation. Particularly here where the load balancer is on the same host as the app server.

- hosts: webservers
  serial: 5
  tasks:
  - name: take out of load balancer pool
    command: /usr/bin/take_out_of_pool {{ inventory_hostname }}
    delegate_to: 127.0.0.1

http://docs.ansible.com/ansible/playbooks_delegation.html

I guess my questions are:

  • Is there an Ansible equivalent of Chef's node['fqdn'] to use the currently being processed host as a variable
  • Am I just completely off the rails for how I'm trying to do this?
2

2 Answers

2
votes

Is there an Ansible equivalent of Chef's node['fqdn'] to use the currently being processed host as a variable

ansible_hostname, ansible_fqdn (both taken from the actual machine settings) or inventory_hostname (defined in the inventory file) depending which you want to use.

1
votes

As you correctly noted, you need to use delegation for this task.

Here is some pseudocode for you to start with:

- name: 33% at a time deploy
  hosts: role_max_logger
  serial: 33%
  tasks:
    - name: take out of lb
      shell: take_out_host.sh --name={{ inventory_hostname }}
      delegate_to: "{{ item }}"
      with_items: "{{ groups['role_max_logger_lb'] }}"
    - name: reload backend
      shell: reload_service.sh
    - name: add back to lb
      shell: add_host.sh --name={{ inventory_hostname }}
      delegate_to: "{{ item }}"
      with_items: "{{ groups['role_max_logger_lb'] }}"

I assume that group role_max_logger defines servers with backend services to be reloaded and group role_max_logger_lb defines servers with load balancers.

This play take all hosts from role_max_logger, splits them into 33% batches; then for each host in the batch it executes take_out_host.sh on each of load balancers passing current backend hostname as parameter; after all hosts from current batch are disabled on load balancers, backend services are reloaded; after that, hosts are added back to LB as in the first task. This operation is then repeated for every batch.