4
votes

The problem:

I'm trying to build a Docker Swarm cluster on Digital Ocean, consisting of 3 "manager" nodes and however many worker nodes. The number of worker nodes isn't particularly relevant for this question. I'm trying to module-ize the Docker Swarm provisioning stuff, so its not specifically coupled to the digitalocean provider, but instead can receive a list of ip addresses to act against provisioning the cluster.

In order to provision the master nodes, the first node needs to be put into swarm mode, which generates a join key that the other master nodes will use to join the first one. "null_resource"s are being used to execute remote provisioners against the master nodes, however, I cannot figure out how dafuq to make sure the first master node completes doing its stuff ("docker swarm init ..."), before having another "null_resource" provisioner execute against the other master nodes that need to join the first one. They all run in parallel and predictably, it doesn't work.

Further, trying to figure out how to collect the first node's generated join-token and make it available to the other nodes. I've considered doing this with Consul, and storing the join token as a key, and getting that key on the other nodes - but this isn't ideal as... there are still issues with ensuring the Consul cluster is provisioned and ready (so kind of the same problem).

main.tf

variable "master_count" { default = 3 }

# master nodes
resource "digitalocean_droplet" "master_nodes" {
  count               = "${var.master_count}"
  ... etc, etc
}

module "docker_master" {
  source          = "./docker/master"
  private_ip      = "${digitalocean_droplet.master_nodes.*.ipv4_address_private}"
  public_ip       = "${digitalocean_droplet.master_nodes.*.ipv4_address}"
  instances       = "${var.master_count}"
}

docker/master/main.tf

variable "instances" {}
variable "private_ip" { type = "list" }
variable "public_ip" { type = "list" }


# Act only on the first item in the list of masters...
resource "null_resource" "swarm_master" {
  count = 1

  # Just to ensure this gets run every time
  triggers {
    version = "${timestamp()}"
  }

  connection {
    ...
    host = "${element(var.public_ip, 0)}"
  }

  provisioner "remote-exec" {
    inline = [<<EOF
      ... install docker, then ...

      docker swarm init --advertise-addr ${element(var.private_ip, 0)}

      MANAGER_JOIN_TOKEN=$(docker swarm join-token manager -q)
      # need to do something with the join token, like make it available
      # as an attribute for interpolation in the next "null_resource" block
    EOF
    ]
  }
}


# Act on the other 2 swarm master nodes (*not* the first one)
resource "null_resource" "other_swarm_masters" {
  count = "${var.instances - 1}"

  triggers {
    version = "${timestamp()}"
  }

  # Host key slices the 3-element IP list and excludes the first one
  connection {
    ...
    host = "${element(slice(var.public_ip, 1, length(var.public_ip)), count.index)}"
  }

  provisioner "remote-exec" {
    inline = [<<EOF
      SWARM_MASTER_JOIN_TOKEN=$(consul kv get docker/swarm/manager/join_token)
      docker swarm join --token ??? ${element(var.private_ip, 0)}:2377
    EOF
    ]
  }

  ##### THIS IS THE MEAT OF THE QUESTION ###
  # How do I make this "null_resource" block not run until the other one has
  # completed and generated the swarm token output? depends_on doesn't
  # seem to do it :(
}

From reading through github issues, I get the feeling this isn't an uncommon problem... but its kicking my ass. Any suggestions appreciated!

1
So, you could add your token to consul in null_reource1 and then retrieve it in null_resource 2. And you can make null_resource2 depend on null_resource1.victor m

1 Answers

3
votes

@victor-m's comment is correct. If you use a null_resource and have the following trigger on any former's property, then they will execute in order.

resource "null_resource" "first" {
  provisioner "local-exec" {
    command = "echo 'first' > newfile"
  }
}

resource "null_resource" "second" {
  triggers = {
    order = null_resource.first.id
  }

  provisioner "local-exec" {
    command = "echo 'second' >> newfile"
  }
}

resource "null_resource" "third" {
  triggers = {
    order = null_resource.second.id
  }

  provisioner "local-exec" {
    command = "echo 'third' >> newfile"
  }
}
$ terraform apply

null_resource.first: Creating...
null_resource.first: Provisioning with 'local-exec'...
null_resource.first (local-exec): Executing: ["/bin/sh" "-c" "echo 'first' > newfile"]
null_resource.first: Creation complete after 0s [id=3107778766090269290]
null_resource.second: Creating...
null_resource.second: Provisioning with 'local-exec'...
null_resource.second (local-exec): Executing: ["/bin/sh" "-c" "echo 'second' >> newfile"]
null_resource.second: Creation complete after 0s [id=3159896803213063900]
null_resource.third: Creating...
null_resource.third: Provisioning with 'local-exec'...
null_resource.third (local-exec): Executing: ["/bin/sh" "-c" "echo 'third' >> newfile"]
null_resource.third: Creation complete after 0s [id=6959717123480445161]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

To make sure, cat the new file and here's the output as expected

$ cat newfile
first
second
third