0
votes

I am currently building a serverless application based in AWS Lambda that creates CloudFront distributions on behalf of users. Currently, when a user calls my 'delete' operation, my API Lambda function disables the CloudFront Distribution. However, the distributions are never cleaned up and deleted because I need to wait for the disable to complete first. Given Lambda's limit of 15 minutes I can't just wait for the disable to finish deploying, and that would be cost-inefficient even if I could.

I realize I could have a Lambda function periodically poll my CloudFront distributions and clean them up, but I'm hoping to do this in an event-driven way so that it occurs as close to real-time as possible and I don't need to use any compute when there's nothing to delete.

I tried setting a CloudWatch Event to trigger on UpdateDistribution calls, but that triggers when the distribution begins to disable rather than when it finishes, so that doesn't really fix the issue where I need to wait for the deploy.

Any suggestions on how to accomplish this? Is it even possible?

2

2 Answers

0
votes

I recommend using an AWS Step Function to manage the final deletion of the CloudFront Distribution. Your lambda could disable the CloudFront Distribution and invoke the Step Function. The step function could manage the polling of the distribution and final deletion by calling Lambda functions.

0
votes

AWS officially say that CloudFront takes a maximum of 15 minutes to enable or disable a distribution. Although we have found this to generally true, we have encountered instances where it takes longer than 15 minutes to disable a distribution, and thus I would recommend that any solution must account for the possibility that, for whatever reason, it may take longer than 15 minutes to disable a distribution.

Although not a perfect event driven solution, it's definitely possible to do this in a mostly event driven manner. Our solution is accomplished using CloudWatch, Lambda, SQS, and CloudTrail. It's clear that you have already enabled CloudTrail but for future readers who may face a similar challenge, if you haven't done so already, the first step is to enable CloudTrail logging in your account (note: using CloudTrail will incur charges as detailed here). Once CloudTrail is enabled, CloudWatch Events will be able to get the CloudFront events from CloudTrail.

Now that CloudTrail is enabled, we can get on with actually deleting the disabled distribution. In CloudWatch, create a CloudWatch Event that, on the UpdateDistribution event from CloudFront, triggers a Lambda function that publishes a message containing the distribution id to an SQS Delay Queue that has the maximum delay of 15 minutes (console based setup instructions here).

The consumer of message in the SQS queue is a second Lambda function. When invoked, this Lambda function should call the CloudFront GetDistribution API for the distribution id provided, and check if Distribution.DistributionConfig.Enabled === true, and if that's false, check if Distribution.Status === Deployed.

The logic can be adjusted to your particular needs or use case, but from here a potential workflow would be to simply check if Distribution.DistributionConfig.Enabled === true. If it's true then break out of the function because you were expecting it to be disabled, but it's not, so something is not right; maybe the distribution was reenabled manually or there was an error disabling it somewhere, or maybe the UpdateDistribution API was called for another reason, whatever the cause, the distribution is not in the state we're expecting and we shouldn't continue. Before breaking out of the function you should/could make sure this doesn't fail silently by sending an SNS message to notify admin that this error happened.

Moving on, if Distribution.DistributionConfig.Enabled === true returns false, check to make sure that Distribution.Status === Deployed. If this returns false then it would mean that the disabling is still InProgress and you need to wait longer. To deal with this without keeping your Lambda running (and incurring charges for this) simply publish a second message to my Delay Queue that again contains the distribution id, and the function then ends. The Lambda will trigger again in 15 minutes and repeat the above process on the next invocation.

If the check on Distribution.DistributionConfig.Enabled === true returns false and Distribution.Status === Deployed returns true, then the distribution is disabled and ready to be deleted. From here simply call the DeleteDistribution API with the distribution id and the ETag from the GetDistribution call you made at the beginning. The API operation looks something like DeleteDistribution --id {distributionId} --if-match {ETag}. If it is successfully deleted you'll get a 204, and that's it, all done. If there's an error you'll get a 4xx that you'll need to handle.