0
votes

When using Terraform to provision multiple machines, and the Terraform Chef provisioner to configure the machine, I am only able to get it work if only one "resource" is being cheffed in the Terraform run. Everything works perfectly when only one VM is targeted. When more than one resource is provisioned, the chef run will hang at the Creating configuration files... step.

I have tried using modules, provisioning inside each resource, and most recently using null_resources to provision the vm resources after they've been created. (The null_resource has proven very useful, as it allows me to iterate on the chef run quickly without having to re-spin the VM resource every time, as I did when the provisioner was inside the resource block.)

This happened on TF 0.11, and continues in v0.12:

Terraform v0.12.8
+ provider.null v2.1.2
+ provider.vra7 v0.4.1

Provisioner inside the resource:

resource "vra7_deployment" "vra-vm" {
 ...
  resource_configuration = {
    "vSphere_Machine_1.name" = ""
    "vSphere_Machine_1.ip_address" = ""
    "vSphere_Machine_1.description" = "Terraform ICE SQL"
  }
  ...

  provisioner "chef" {
    # This is for TF to talk to the new node
    connection {
      host = self.resource_configuration["vSphere_Machine_1.ip_address"]
      type = "winrm"
      user = var.KT_USER
      password = var.KT_PASS
      insecure = true
    }

    # This is for TF to talk to the chef_server
    # Note! the version constraint doesn't work
    server_url = var.chef_server_url
    node_name  = "ICE-SQL-${self.resource_configuration["vSphere_Machine_1.name"]}"
    run_list   = var.sql_run_list
    recreate_client = true
    environment = "_default"
    ssl_verify_mode = ":verify_none"
    version = "~> 12"
    user_name  = local.username
    user_key   = file("${local.user_key_path}")
  }

Provisioner using null_resource block:

resource "vra7_deployment" "ICE-SQL" {
  count = var.sql_count # will be 1/on or 0/off
  ...
  resource_configuration = {
    "vSphere_Machine_1.name" = ""
    "vSphere_Machine_1.ip_address" = ""
    "vSphere_Machine_1.description" = "Terraform ICE SQL"
  }
}

locals {
    sql_ip   = vra7_deployment.ICE-SQL[0].resource_configuration["vSphere_Machine_1.ip_address"]
    sql_name = vra7_deployment.ICE-SQL[0].resource_configuration["vSphere_Machine_1.name"]
  }

resource "null_resource" "sql-chef" { 
  # we can use count to switch creating this on or off for testing
  count = 0

  provisioner "chef" {
    # This is for TF to talk to the new node
    connection {
      host = local.sql_ip
      type = "winrm"
      user = var.KT_USER
      password = var.KT_PASS
      insecure = true
    }

    # This is for TF to talk to the chef_server
    # Don't use the local var here, so TF knows to create the dependency
    server_url = var.chef_server_url
    node_name  = "ICE-SQL-${vra7_deployment.ICE-SQL[0].resource_configuration["vSphere_Machine_1.name"]}"
    run_list   = var.sql_run_list
    recreate_client = true
    environment = "_default"
    ssl_verify_mode = ":verify_none"
    version = "12"
    user_name  = local.username
    user_key   = file("${local.user_key_path}")
    client_options = var.chef_client_options
  }
}

modules

### main.tf
module "SQL" {
  source   = "./modules/vra-chef"
  VRA_USER = var.VRA_USER
  VRA_PASS = var.VRA_PASS
  KT_USER  = var.KT_USER
  KT_PASS  = var.KT_PASS

  description = "ICE SQL"
  run_list    = var.sql_run_list
}

### modules/vra-chef/main.tf
resource "vra7_deployment" "vra-chef" {
  count = var.server_count
...
  resource_configuration = {
    "vSphere_Machine_1.name"       = var.resource_name
    "vSphere_Machine_1.ip_address"  = var.resource_ip
    "vSphere_Machine_1.description" = "${var.description}-${count.index}"
  }

  provisioner "chef" {
    # This is for TF to talk to the new node
    connection {
      host = self.resource_configuration["vSphere_Machine_1.ip_address"]
      type = "winrm"
      user = var.KT_USER
      password = var.KT_PASS
      insecure = true
    }

    # This is for TF to talk to the chef_server
    server_url = var.chef_server_url
    node_name  = self.resource_configuration["vSphere_Machine_1.name"]
    run_list   = var.run_list
    recreate_client = true
    environment = "_default"
    ssl_verify_mode = ":verify_none"
    version = "~> 12"
    user_name  = local.username
    user_key   = file(local.user_key_path)
    client_options = [ "chef_license  'accept'" ]

    # pass custom attributes to the new node
    attributes_json = var.input_json
  }
}

Expected Results:

Chef configures all resources that it is applied to.

Actual Results:

The Terraform Chef provisioner will connect to all resources that it is applied to, and install chef on the clients. When it gets to the creating configuration files... step, it stops sending any more updates, and the Terraform run will keep updating the status every 10s, still creating... for each resource.

vra7_deployment.ICE-REMOTE[0]: Still creating... [9m30s elapsed]
vra7_deployment.ICE-SQL[0]: Still creating... [9m30s elapsed]
vra7_deployment.ICE-MASTER[0]: Still creating... [9m30s elapsed]
vra7_deployment.ICE-MASTER[0]: Creation complete after 9m39s [id=feecf983-48d5-425e-b713-65a1a05fa3ba]
vra7_deployment.ICE-REMOTE[0]: Still creating... [9m40s elapsed]
vra7_deployment.ICE-SQL[0]: Still creating... [9m40s elapsed]
...
vra7_deployment.ICE-SQL[0]: Still creating... [12m10s elapsed]
vra7_deployment.ICE-REMOTE[0]: Still creating... [12m10s elapsed]
vra7_deployment.ICE-REMOTE[0]: Creation complete after 12m11s [id=df64f5ab-af12-4493-8e7d-d7debd93780d]
vra7_deployment.ICE-SQL[0]: Still creating... [12m20s elapsed]
...
vra7_deployment.ICE-SQL[0]: Still creating... [13m10s elapsed]
vra7_deployment.ICE-SQL[0]: Creation complete after 13m11s [id=08ec31f4-124d-470e-b2ba-1833a6f22792]
null_resource.sql-chef[0]: Creating...
null_resource.master-chef[0]: Creating...
null_resource.remote-chef[0]: Creating...
null_resource.sql-chef[0]: Provisioning with 'chef'...
null_resource.master-chef[0]: Provisioning with 'chef'...
null_resource.remote-chef[0]: Provisioning with 'chef'...
null_resource.master-chef[0] (chef): Connecting to remote host via WinRM...
null_resource.master-chef[0] (chef):   Host: 10.12.235.61
null_resource.master-chef[0] (chef):   Port: 5985
null_resource.master-chef[0] (chef):   User: engineering
null_resource.master-chef[0] (chef):   Password: true
null_resource.master-chef[0] (chef):   HTTPS: false
null_resource.master-chef[0] (chef):   Insecure: true
null_resource.master-chef[0] (chef):   NTLM: false
null_resource.master-chef[0] (chef):   CACert: false
null_resource.sql-chef[0] (chef): Connecting to remote host via WinRM...
null_resource.sql-chef[0] (chef):   Host: 10.12.235.50
null_resource.sql-chef[0] (chef):   Port: 5985
null_resource.sql-chef[0] (chef):   User: engineering
null_resource.sql-chef[0] (chef):   Password: true
null_resource.sql-chef[0] (chef):   HTTPS: false
null_resource.sql-chef[0] (chef):   Insecure: true
null_resource.sql-chef[0] (chef):   NTLM: false
null_resource.sql-chef[0] (chef):   CACert: false
null_resource.remote-chef[0] (chef): Connecting to remote host via WinRM...
null_resource.remote-chef[0] (chef):   Host: 10.12.233.51
null_resource.remote-chef[0] (chef):   Port: 5985
null_resource.remote-chef[0] (chef):   User: engineering
null_resource.remote-chef[0] (chef):   Password: true
null_resource.remote-chef[0] (chef):   HTTPS: false
null_resource.remote-chef[0] (chef):   Insecure: true
null_resource.remote-chef[0] (chef):   NTLM: false
null_resource.remote-chef[0] (chef):   CACert: false
null_resource.sql-chef[0] (chef): Connected!
null_resource.remote-chef[0] (chef): Connected!
null_resource.master-chef[0] (chef): Connected!
null_resource.remote-chef[0] (chef): Downloading Chef Client...
null_resource.sql-chef[0] (chef): Downloading Chef Client...
null_resource.remote-chef[0] (chef): Installing Chef Client...
null_resource.sql-chef[0] (chef): Installing Chef Client...
null_resource.remote-chef[0]: Still creating... [10s elapsed]
null_resource.master-chef[0]: Still creating... [10s elapsed]
null_resource.sql-chef[0]: Still creating... [10s elapsed]
null_resource.sql-chef[0] (chef): Creating configuration files...
null_resource.remote-chef[0] (chef): Creating configuration files...
null_resource.master-chef[0] (chef): Downloading Chef Client...
null_resource.master-chef[0] (chef): Installing Chef Client...
null_resource.master-chef[0] (chef): Creating configuration files...
null_resource.remote-chef[0]: Still creating... [20s elapsed]
null_resource.master-chef[0]: Still creating... [20s elapsed]
null_resource.sql-chef[0]: Still creating... [20s elapsed]
null_resource.remote-chef[0]: Still creating... [30s elapsed]
null_resource.sql-chef[0]: Still creating... [30s elapsed]
null_resource.master-chef[0]: Still creating... [30s elapsed]
null_resource.remote-chef[0]: Still creating... [40s elapsed]
null_resource.sql-chef[0]: Still creating... [40s elapsed]
null_resource.master-chef[0]: Still creating... [40s elapsed]
null_resource.remote-chef[0]: Still creating... [50s elapsed]
null_resource.sql-chef[0]: Still creating... [50s elapsed]
null_resource.master-chef[0]: Still creating... [50s elapsed]
null_resource.remote-chef[0]: Still creating... [1m0s elapsed]
null_resource.sql-chef[0]: Still creating... [1m0s elapsed]
null_resource.master-chef[0]: Still creating... [1m0s elapsed]
...loops waiting forever...

Other context:

I've logged this at Terraform's github, with no response. My comments from there:

What i've found is that it seems to not like chef-provisioning more than one machine at a time. So far I've found cases where 1 out of 4 machines will provision perfectly, and the others just hang after they all print the creating configuration files... status. Leaving the first one active, on the next run, the other three will all hang again at the same place. Finally, i tweaked the code to only re-provision one of the machines, and it worked perfectly. To be clear: the same exact code that hangs on a prior run, will execute perfectly when run by itself. I think that's a critical clue to debugging this.

To reiterate: When it gets stuck, the chef provisioning always hangs at the creating configuration files... step. If it gets past that, it always works.

Here is a gist of a chef run using null_provisioner on two resources, both of which hang: https://gist.github.com/mcascone/0b71948f50d52648389e661d00c8e31c

And this is one of a successful, 1-resource run: https://gist.github.com/mcascone/858855b5bd9d5d1cf655d5e10df67801

I keep thinking this is an issue with the same provisioner being called multiple times in the same main.tf file. I'm calling the chef provisioner 3+ times in one apply run. Could it be that the multiple instances of the provisioner are colliding with each other, or there isn't actually support for multiple runs of the same provisioner, and they're all getting instantiated in the same instance and corrupting each other?

1

1 Answers

0
votes

It looks like, for now at least, we have to downgrade to v0.11 to get multiple provision runs to work. Please see this thread: Terraform stucks when instance_count is more than 2 while using remote-exec provisioner