I have a very frustrating Terraform issue, I made some changes to my terraform script which failed when I applied the plan. I've gone through a bunch of machinations and probably made the situation worse as I ended up manually deleting a bunch of AWS resources in trying to resolve this.
So now I am unable to use Terraform at all (refresh, plan, destroy) all get the same error.
The Situation
I have a list of Fargate services, and a set of maps which correlate different features of the fargate services such as the "Target Group" for the load balancer (I've provided some code below). The problem appears to be that Terraform is not picking up that these resources have been manually deleted or is somehow getting confused because they don't exist. At this point if I run a refresh, plan or destroy I get an error stating that a specific list is empty, even though it isn't (or should not be).
In the failed run I added a new service to the list below along with a new url (see code below)
Objective
At this point I would settle for destroying the entire environment (its my dev environment), however; ideally I want to just get the system working such that Terraform will detect the changes and work properly.
Terraform Script is Valid
I have reverted my Terraform scripts back to the last known good version. I have run the good version against our staging environment and it works fine.
Configuration Info
MacOS Mojave 10.14.6 (18G103)
Terraform v0.12.24.
- provider.archive v1.3.0
- provider.aws v2.57.0
- provider.random v2.2.1
- provider.template v2.1.2
The Terraform state file is being stored in a S3 bucket, and
terraform init --reconfigure
has been called.
What I've done
I was originally getting a similar error but it was in a different location, after many hours Googling and trying stuff (which I didn't write down) I decided to manually remove the AWS resources associated with the problematic code (the ALB, Target Groups, security groups)
Example Terraform Script
Unfortunately I can't post the actual script as it is private, but I've posted what I believe is the pertinent parts but have redacted some info. The reason I mention this is that any syntax type error you might see would be caused by this redaction, as I stated above the script works fine when run in our staging environment.
globalvars.tf
In the root directory. In the case of the failed Terraform run I added a new name to the service_names
(edd = "edd"
) list (I added as the first element). In the service_name_map_2_url
I added the new entry (edd = "edd"
) as the last entry. I'm not sure if the fact that I added these elements in different 'order' is the problem, although it really shouldn't since I access the map via the name and not by index
variable "service_names" {
type = list(string)
description = "This is a list/array of the images/services for the cluster"
default = [
"alert",
"alert-config"
]
}
variable service_name_map_2_url {
type = map(string)
description = "This map contains the base URL used for the service"
default = {
alert = "alert"
alert-config = "alert-config"
}
}
alb.tf
In modules/alb
. In this module we create an ALB and then a target group for each service, which looks like this. The items from globalvars.tf are passed into this script
locals {
numberOfServices = length(var.service_names)
}
resource "aws_alb" "orchestration_alb" {
name = "orchestration-alb"
subnets = var.public_subnet_ids
security_groups = [var.alb_sg_id]
tags = {
environment = var.environment
group = var.tag_group_name
app = var.tag_app_name
contact = var.tag_contact_email
}
}
resource "aws_alb_target_group" "orchestration_tg" {
count = local.numberOfServices
name = "${var.service_names[count.index]}-tg"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
deregistration_delay = 60
tags = {
environment = var.environment
group = var.tag_group_name
app = var.tag_app_name
contact = var.tag_contact_email
}
health_check {
path = "/${var.service_name_map_2_url[var.service_names[count.index]]}/health"
port = var.app_port
protocol = "HTTP"
healthy_threshold = 2
unhealthy_threshold = 5
interval = 30
timeout = 5
matcher = "200-308"
}
}
output.tf
This is the output of the alb.tf
, other things are outputted but this is the one that matters for this issue
output "target_group_arn_suffix" {
value = aws_alb_target_group.orchestration_tg.*.arn_suffix
}
cloudwatch.tf
In modules/cloudwatch
. I attempt to create a dashboard
data "template_file" "Dashboard" {
template = file("${path.module}/dashboard.json.template")
vars = {
...
alert-tg = var.target_group_arn_suffix[0]
alert-config-tg = var.target_group_arn_suffix[1]
edd-cluster-name = var.ecs_cluster_name
alb-arn-suffix = var.alb-arn-suffix
}
}
Error
When I run terraform refresh
(or plan or destroy) I get the following error (I get the same error for alert-config as well)
Error: Invalid index
on modules/cloudwatch/cloudwatch.tf line 146, in data "template_file" "Dashboard":
146: alert-tg = var.target_group_arn_suffix[0]
|----------------
| var.target_group_arn_suffix is empty list of string
The given key does not identify an element in this collection value.
AWS Environment
I have manually deleted the ALB
. Dashboard
and all Target Groups
. I would expect (and this has worked in the past) that Terraform would detect this and update its state file appropriately such that when running a plan it would know it has to create the ALB and target groups.
Thank you