2
votes

I am using terraform to manage aws environments for our application. The environments have s3 buckets for various things. And when setting up a new environment I just want to copy the buckets from a base source bucket, or from an existing environment.

But I can't find anything that will provision a copy. The AWS interface lets you duplicate the setting when creating (which I don't need), but not the objects, so it may not be something terraform can do directly.

If so, how about indirectly?

2
Just to confirm is the request to copy the contents of an S3 bucket to a new S3 bucket via Terraform?Chris Williams
Yes, overall to create a new bucket and have the same contents as an existing bucket. If that means creating the new bucket first and then somehow copying the contents of a source bucket into the new bucket, then so be it.Randell

2 Answers

4
votes

There is no resource that enables the copying of objects from one S3 bucket to another. If you want to include this in your Terraform setup then you would need to use a local-exec provisioner.

It would need to execute the command below, with the support the AWS CLI to run aws s3 cp.

resource "null_resource" "s3_objects" {
  provisioner "local-exec" {
    command = "aws s3 cp s3://bucket1 s3://bucket2 --recursive"
  }
}

For this to run the local server would need to have the AWS CLI installed with a role (or valid credentials) to enable the copy.

3
votes

Generally-speaking, Terraform providers reflect operations that are natively supported by the underlying APIs, but in some cases we can use various Terraform resource types together to achieve functionality that the underlying provider lacks.

I believe there's no native S3 operation for bulk-copying objects from one bucket to another, so to solve this with Terraform requires decomposing the problem into smaller steps, which I think in this case would be:

  • Declare a new bucket, the target
  • List all of the objects in the source bucket
  • Declare one object in the new bucket per object in the source bucket.

The AWS provider can in principle do all three of these operations: it has managed resource types for both buckets and bucket objects, and it has a data source aws_s3_bucket_objects which can enumerate some or all of the objects in a bucket.

We can combine those pieces together in a Terraform configuration like this:

resource "aws_s3_bucket" "target" {
  bucket = "copy-example-target"
}

data "aws_s3_bucket_objects" "source" {
  bucket = "copy-example-source"
}

data "aws_s3_bucket_object" "source" {
  for_each = toset(data.aws_s3_bucket_objects.source.keys)

  bucket = data.aws_s3_bucket_objects.source.bucket
  key    = each.key
}

resource "aws_s3_bucket_object" "target" {
  for_each = aws_s3_bucket_object.source

  bucket  = aws_s3_bucket.target.bucket
  key     = each.key
  content = each.value.body
}

With that said, Terraform is likely not the best tool to for this situation for the following reasons:

  • The above configuration will cause Terraform to read all of the objects in the bucket into memory, which would be time consuming and use lots of RAM for larger buckets, and then ultimately store all of them in the Terraform state, which would make the state itself very large.
  • Because the aws_s3_bucket_object data source is intended mainly for retrieving small text-based objects, the above will work only if everything in the bucket meets the limitations described in the aws_s3_bucket_object documentation: the objects must all have text-indicating MIME types and they must all contain UTF-8 encoded text.

In this case then, I would prefer to use a specialized tool for the job which is designed to exploit all of the features of the S3 API to make the copy as efficient as possible, such as streaming the list of objects and streaming the contents of each object in chunks to avoid the need to have all of the data in memory at once. One such tool is in the AWS CLI itself, in the form of the aws s3 cp command with the --recursive option.