As you specified that you are having issues with deleting VPC within stacks containing lambdas which themselves are in VPC, this most probably could be because of the network interfaces being generated by lambdas to connect to other resources in the VPC.
Technically these network interfaces should be auto-deleted when lambdas are undeployed from the stack but in my experience, I have observed orphaned ENI's which doesn't let the VPC be undeployed.
For this reason, I created a custom resource backed lambda which cleans up the ENI's after all lambdas within VPC's have been undeployed.
This is the cloud formation part where you setup the custom resource and pass the VPC ID
##############################################
# #
# Custom resource deleting net interfaces #
# #
##############################################
NetInterfacesCleanupFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src
Handler: cleanup/network_interfaces.handler
Role: !GetAtt BasicLambdaRole.Arn
DeploymentPreference:
Type: AllAtOnce
Timeout: 900
PermissionForNewInterfacesCleanupLambda:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:invokeFunction
FunctionName:
Fn::GetAtt: [ NetInterfacesCleanupFunction, Arn ]
Principal: lambda.amazonaws.com
InvokeLambdaFunctionToCleanupNetInterfaces:
DependsOn: [PermissionForNewInterfacesCleanupLambda]
Type: Custom::CleanupNetInterfacesLambda
Properties:
ServiceToken: !GetAtt NetInterfacesCleanupFunction.Arn
StackName: !Ref AWS::StackName
VPCID:
Fn::ImportValue: !Sub '${MasterStack}-Articles-VPC-Ref'
Tags:
'owner': !Ref StackOwner
'task': !Ref Task
And this is the corresponding lambda. This lambda tries 3 times to detach and delete orphaned network interfaces and if fails if it can't which means there's still a lambda which is generating new network interfaces and you need to debug for that.
import boto3
from botocore.exceptions import ClientError
from time import sleep
# Fix this wherever your custom resource handler code is
from common import cfn_custom_resources as csr
import sys
MAX_RETRIES = 3
client = boto3.client('ec2')
def handler(event, context):
vpc_id = event['ResourceProperties']['VPCID']
if not csr.__is_valid_event(event, context):
csr.send(event, context, FAILED, validate_response_data(result))
return
elif event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
result = {'result': 'Don\'t trigger the rest of the code'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
return
try:
# Get all network intefaces for given vpc which are attached to a lambda function
interfaces = client.describe_network_interfaces(
Filters=[
{
'Name': 'description',
'Values': ['AWS Lambda VPC ENI*']
},
{
'Name': 'vpc-id',
'Values': [vpc_id]
},
],
)
failed_detach = list()
failed_delete = list()
# Detach the above found network interfaces
for interface in interfaces['NetworkInterfaces']:
detach_interface(failed_detach, interface)
# Try detach a second time and delete each simultaneously
for interface in interfaces['NetworkInterfaces']:
detach_and_delete_interface(failed_detach, failed_delete, interface)
if not failed_detach or not failed_delete:
result = {'result': 'Network interfaces detached and deleted successfully'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
else:
result = {'result': 'Network interfaces couldn\'t be deleted completely'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
# print(response)
except Exception:
print("Unexpected error:", sys.exc_info())
result = {'result': 'Some error with the process of detaching and deleting the network interfaces'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
def detach_interface(failed_detach, interface):
try:
if interface['Status'] == 'in-use':
detach_response = client.detach_network_interface(
AttachmentId=interface['Attachment']['AttachmentId'],
Force=True
)
# Sleep for 1 sec after every detachment
sleep(1)
print(f"Detach response for {interface['NetworkInterfaceId']}- {detach_response}")
if 'HTTPStatusCode' not in detach_response['ResponseMetadata'] or \
detach_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_detach.append(detach_response)
except ClientError as e:
print(f"Exception details - {sys.exc_info()}")
def detach_and_delete_interface(failed_detach, failed_delete, interface, retries=0):
detach_interface(failed_detach, interface)
sleep(retries + 1)
try:
delete_response = client.delete_network_interface(
NetworkInterfaceId=interface['NetworkInterfaceId'])
print(f"Delete response for {interface['NetworkInterfaceId']}- {delete_response}")
if 'HTTPStatusCode' not in delete_response['ResponseMetadata'] or \
delete_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_delete.append(delete_response)
except ClientError as e:
print(f"Exception while deleting - {str(e)}")
print()
if retries <= MAX_RETRIES:
if e.response['Error']['Code'] == 'InvalidNetworkInterface.InUse' or \
e.response['Error']['Code'] == 'InvalidParameterValue':
retries = retries + 1
print(f"Retry {retries} : Interface in use, deletion failed, retrying to detach and delete")
detach_and_delete_interface(failed_detach, failed_delete, interface, retries)
else:
raise RuntimeError("Code not found in error")
else:
raise RuntimeError("Max Number of retries exhausted to remove the interface")
The link to the lambda is https://gist.github.com/revolutionisme/8ec785f8202f47da5517c295a28c7cb5
More information about configuring lambdas in a VPC - https://docs.aws.amazon.com/lambda/latest/dg/vpc.html