Adding multiple S3 paths to glue crawler with terraform

Question

I'm building out some infrastructure in AWS with Terraform. I have several S3 buckets created and want a Glue crawler to crawl these buckets once per hour. My Terraform Glue catalog db, role, and policy all build fine but when I try to create the crawler resource by adding four S3 paths to the s3_target{} portion of the crawler, I get a failure:

resource "aws_glue_crawler" "datalake_crawler" {
  database_name = "${var.glue_db_name}"
  name          = "${var.crawler_name}"
  role          = "${aws_iam_role.glue.id}" 

  s3_target {
#    count = "${length(var.data_source_path)}"
    path = "${var.data_source_path}"#"${formatlist("%s", var.data_source_path)}"
  }
}

This causes an error:

Error: aws_glue_crawler.datalake_crawler: s3_target.0.path must be a single value, not a list

I have tried adding a count statement in the s3_target but this fails. I have also tried adding

"${formatlist("%s", var.data_source_path)}"

in the path argument but this too fails.

Can I add multiple s3 paths to a Glue Crawler with Terraform? I can make this happen through the AWS console but this needs to be done using infrastructure as code.

I've not used Glue but from a quick look at the docs it looks like you can just repeat the s3_target block for each path. On my phone right now so can't test it to make this a proper answer. — ydaetskcoR
Adding three more s3_target blocks to the glue crawler resource allowed me to add all four of my buckets to the crawler. I had reviewed the glue docs but saw nowhere that led me to believe that I could replicate the s3_target blocks. Can you help me see what was missing? Also, can I add those blocks progrommatically based upon a variable? Feel free to add as an answer when you're back at a box; happy to accept. — Steven

ydaetskcoR ydaetskcoR · Accepted Answer · 2019-02-19T08:28:09

To target additional S3 paths you can just repeat the s3_target block multiple times like this:

resource "aws_glue_crawler" "datalake_crawler" {
  database_name = "${var.glue_db_name}"
  name          = "${var.crawler_name}"
  role          = "${aws_iam_role.glue.id}" 

  s3_target {
    path = "${var.data_source_path_1}"
  }

  s3_target {
    path = "${var.data_source_path_2}"
  }
}

This is briefly alluded to in the aws_glue_crawler resource docs where it says:

s3_target (Optional) List nested Amazon S3 target arguments. See below.

You can also see this in the source code for the resource's schema:

        "s3_target": {
            Type:     schema.TypeList,
            Optional: true,
            MinItems: 1,

Unfortunately, pre 0.12, you can't build this programatically directly in Terraform to loop over a list of dynamic paths and need to specify them statically.

Terraform 0.12 will introduce HCL2 which has better support for loops (other than using count) including dynamic blocks which would allow you to then do something like this:

resource "aws_glue_crawler" "datalake_crawler" {
  database_name = var.glue_db_name
  name          = var.crawler_name
  role          = aws_iam_role.glue.id 

  dynamic "s3_target" {
    for_each = var.data_source_paths

    content {
      path = s3_target
    }
  }
}

Adding multiple S3 paths to glue crawler with terraform

1 Answers