When starting from an AMI that has cloud-init installed (which is common in many official Linux distri), we can use cloud-init's write_files
module to place arbitrary files into the filesystem, as long as they are small enough to fit within the constraints of the user_data
argument along with all of the other cloud-init
data.
As with all cloud-init modules, we configure write_files
using cloud-init's YAML-based configuration format, which begins with the special marker string #cloud-config
on a line of its own, followed by a YAML data structure. Because JSON is a subset of YAML, we can use Terraform's jsonencode
to produce a valid value[1].
locals {
cloud_config_config = <<-END
#cloud-config
${jsonencode({
write_files = [
{
path = "/etc/example.txt"
permissions = "0644"
owner = "root:root"
encoding = "b64"
content = filebase64("${path.module}/example.txt")
},
]
})}
END
}
The write_files
module can accept data in base64 format when we set encoding = "b64"
, so we use that in conjunction with Terraform's filebase64
function to include the contents of an external file. Other approaches are possible here, such as producing a string dynamically using Terraform templates and using base64encode
to encode it as the file contents.
If you can express everything you want cloud-init to do in a single configuration file like the above then you can assign local.cloud_config_config
directly as your instance user_data
, and cloud-config will should recognize and process it on system boot:
user_data = local.cloud_config_config
If you instead need to combine creating the file with some other actions, like running a shell script, you can use cloud-init's multipart archive format to encode multiple "files" for cloud-init to process. Terraform has a cloudinit
provider that contains a data source for easily constructing a multipart archive for cloud-init:
data "cloudinit_config" "example" {
gzip = false
base64_encode = false
part {
content_type = "text/cloud-config"
filename = "cloud-config.yaml"
content = local.cloud_config_config
}
part {
content_type = "text/x-shellscript"
filename = "example.sh"
content = <<-EOF
#!/bin/bash
echo "Hello World"
EOT
}
}
This data source will produce a single string at cloudinit_config.example.rendered
which is a multipart archive suitable for use as user_data
for cloud-init:
user_data = cloudinit_config.example.rendered
EC2 imposes a maximum user-data size of 64 kilobytes, so all of the encoded data together must fit within that limit. If you need to place a large file that comes close to or exceeds that limit, it would probably be best to use an intermediate other system to transfer that file, such as having Terraform write the file into an Amazon S3 bucket and having the software in your instance retrieve that data using instance profile credentials. That shouldn't be necessary for small data files used for system configuration, though.
It's important to note that from the perspective of Terraform and EC2 the content of user_data
is just an arbitrary string. Any issues in processing the string must be debugged within the target operating system itself, by reading the cloud-init logs to see how it interpreted the configuration and what happened when it tried to take those actions.
[1]: We could also potentially use yamlencode
, but at the time I write this that function has a warning that its exact formatting may change in future Terraform versions, and that's undesirable for user_data
because it would cause the instance to be replaced. If you are reading this in the future and that warning is no longer present in the yamldecode
docs, consider using yamlencode
instead.