10
votes

Use Case

Trying to provision a (Docker Swarm or Consul) cluster where initializing the cluster first occurs on one node, which generates some token, which then needs to be used by other nodes joining the cluster. Key thing being that nodes 1 and 2 shouldn't attempt to join the cluster until the join key has been generated by node 0.

Eg. on node 0, running docker swarm init ... will return a join token. Then on nodes 1 and 2, you'd need to pass that token to the same command, like docker swarm init ${JOIN_TOKEN} ${NODE_0_IP_ADDRESS}:{SOME_PORT}. And magic, you've got a neat little cluster...

Attempts So Far

  • Tried initializing all nodes with the AWS SDK installed, and storing the join key from node 0 on S3, then fetching that join key on other nodes. This is done via a null_resource with 'remote-exec' provisioners. Due to the way Terraform executes things in parallel, there are racy type conditions and predictably nodes 1 and 2 frequently attempt to fetch a key from S3 thats not there yet (eg. node 0 hasn't finished its stuff yet).

  • Tried using the 'local-exec' provisioner to SSH into node 0 and capture its join key output. This hasn't worked well or I sucked at doing it.


I've read the docs. And stack overflow. And Github issues, like this really long outstanding one. Thoroughly. If this has been solved elsewhere though, links appreciated!


PS - this is directly related to and is a smaller subset of this question, but wanted to re-ask it in order to focus the scope of the problem.

7

7 Answers

18
votes

You can redirect the outputs to a file:

resource "null_resource" "shell" {

  provisioner "local-exec" {
    command = "uptime 2>stderr >stdout; echo $? >exitstatus"
  }
}

and then read the stdout, stderr and exitstatus files with local_file

The problem is that if the files disappear, then terraform apply will fail.

In terraform 0.11 I made a workaround by reading the file with external data source and storing the results in a null_resource triggers (!)

resource "null_resource" "contents" {
  triggers = {
    stdout     = "${data.external.read.result["stdout"]}"
    stderr     = "${data.external.read.result["stderr"]}"
    exitstatus = "${data.external.read.result["exitstatus"]}"
  }

  lifecycle {
    ignore_changes = [
      "triggers",
    ]
  }
}

But in 0.12 this can be replaced with file()

and then finally I can use / output those with:

output "stdout" {
  value = "${chomp(null_resource.contents.triggers["stdout"])}"
}

See the module https://github.com/matti/terraform-shell-resource for full implementation

5
votes

When I asked myself the same question, "Can I use output from a provisioner to feed into another resource's variables?", I went to the source for answers.

At this moment in time, provisioner results are simply streamed to terraform's standard out and never captured.

Given that you are running remote provisioners on both nodes, and you are trying to access values from S3 - I agree with this approach by the way, I would do the same - what you probably need to do is handle the race condition in your script with a sleep command, or by scheduling a script to run later with the at or cron or similar scheduling systems.

In general, Terraform wants to access all variables either up front, or as the result of a provider. Provisioners are not necessarily treated as first-class in Terraform. I'm not on the core team so I can't say why, but my speculation is that it reduces complexity to ignore provisioner results beyond success or failure, since provisioners are just scripts so their results are generally unstructured.

If you need more enhanced capabilities for setting up your instances, I suggest a dedicated tool for that purpose like Ansible, Chef, Puppet, etc. Terraform's focus is really on Infrastructure, rather than software components.

4
votes

You can use external data:

data "external" "docker_token" {
  program = ["/bin/bash", "-c" "echo \"{\\\"token\\\":\\\"$(docker swarm init...)\\\"}\""]
}

Then the token will be available as data.external.docker_token.result.token. If you need to pass arguments in, you can use a script (e.g. relative to path.module). See https://www.terraform.io/docs/providers/external/data_source.html for details.

2
votes

You could effectively run the docker swarm init step for node 0 as a Terraform External Data Source, and have it return JSON. Make the provisioning of the remaining nodes depend on this step and refer to the join token generated by the external data source.

https://www.terraform.io/docs/providers/external/data_source.html

1
votes

Simpler solution would be to provide the token yourself.

When creating the ACL token, simply pass in the ID value and consul will use that instead of generating one at random.

0
votes

With resource dependencies you can ensure that a resource is created before another.

Here's an incomplete example of how I create my consul cluster, just to give you an idea.

resource "aws_instance" "consul_1" {
    user_data = <<EOF
    #cloud-config
    runcmd:
    - 'docker pull consul:0.7.5'
    - 'docker run -d -v /etc/localtime:/etc/localtime:ro -v $(pwd)/consul-data:/consul/data --restart=unless-stopped --net=host consul:0.7.5 agent -server -advertise=${self.private_ip} -bootstrap-expect=2 -datacenter=wordpress -log-level=info -data-dir=/consul/data'
    EOF

}

resource "aws_instance" "consul_2" {

    depends_on = ["aws_instance.consul_1"]

    user_data = <<EOF
    #cloud-config
    runcmd:
    - 'docker pull consul:0.7.5'
    - 'docker run -d -v /etc/localtime:/etc/localtime:ro -v $(pwd)/consul-data:/consul/data --restart=unless-stopped --net=host consul:0.7.5 agent -server -advertise=${self.private_ip} -retry-join=${aws_instance.consul_1.private_ip} -datacenter=wordpress -log-level=info -data-dir=/consul/data'
    EOF

 }

For the docker swarm setup I think it's out of Terraform scope and I think it should because the token isn't an attribute of the infrastructure you are creating. So I agree with nbering, you could try to achieve that setup with a tool like Ansible or Chef.

But anyways, if the example helps you to setup your consul cluster I think you just need to configure consul as your docker swarm backend.

0
votes

Sparrowform - is a lightweight provisioner for Terraform based infrastructure can handle your case. Here is example for aws ec2 instances.

Assuming we have 3 ec2 instances for consul cluster: node0, node1 and node2. The first one (node0) is where we fetch token from and keep it in S3 bucket. The other two ones load token later from S3.

$ nano aws_instance.node0.sparrowfile 

#!/usr/bin/env perl6

# have not checked this command, but that's the idea ...
bash "docker swarm init | aws s3 cp - s3://alexey-bucket/stream.txt"

$ nano aws_instance.node1.sparrowfile

#!/usr/bin/env perl6

my $i=0;
my $token;

try {

  while True {
    my $s3-token = run 'aws', 's3', 'cp', 's3://alexey-bucket/stream.txt', '-', :out;
    $token = $s3-token.out.lines[0];
    $s3-token.out.close;
    last if $i++ > 8 or $token;
    say "retry num $i ...";
    sleep 2*$i;
  }

  CATCH { { .resume } }

}

die "we have not succeed in fetching token" unless $token;

bash "docker swarm init $token";

$ nano aws_instance.node2.sparrowfile - the same setup as for node1


$ terrafrom apply # bootstrap infrastructure

$ sparrowform --ssh_private_key=~/.ssh/aws.pub --ssh_user=ec2-user # run provisioning on node0, node1, node2

PS disclosure, I am the tool author.