1
votes

For over two days, I've been trying to deploy a CloudFormation stack using serverless framework. The thing is, as part of the stack, I have an RDS cluster as well as a custom resource which relies on a Lambda function (written in Python) for initializing some database tables.

The details of this custom resource in the serverless.yml file are the following:

rdsMigration:
  Type: Custom::DatabaseMigration
  DependsOn: rdsCluster
  Properties:
    ServiceToken: !GetAtt MigrateDatabaseLambdaFunction.Arn
    Version: 1.0

When deploying using sls deploy, the cluster and the lambda functions are created correctly, but the process is stuck on creating the rdsMigration resource.

In the Lambda code, I've been careful to generate the response in all possible scenarios, including exceptions. However, that does not seem to be the problem.

Apparently, the function is not being invoked... kind of, because even the charts look weird:

Monitoring of the Lambda function

You can see how there are no invocations, but there is a red dot in "Error count and success rate" about 5:15 PM, which is the time at which the resource creation started. Also, there are no green dots, and you can see the warning down in the legend, which claims that "One or more data-points have been dropped due to non-numeric values (NaN, -Infinite, +Infinite)". How is this possible? I assume it is no standard behavior, since other Lambda functions (which must be called using an API Gateway endpoint) do not show this strange chart.

Also, there are no log streams in CloudWatch. It is completely empty, as if the function was never invoked (which seems the case, except for the strange "red dot" at the moment of resource creation).

Finally, if I run a test case using the "AWS CloudFormation Create Request" template, the function runs properly, it creates the initial tables I expected for the DB (not always, but that is a different matter) and returns the response.

Do you have any idea of what is going on here? The worst about this is that I need to wait two hours between tests, since the CFN stack gets stuck during the creation and destruction steps until the timeout occurs.

Thanks!

2

2 Answers

1
votes

The issue is with your lambda function. You have to send back the SUCCESS or FAILURE signals back to the CFN. Since your lambda function is nots sending any signals, its waiting for Timeout (2 hours) and the Cloudformation gets failed

1.The custom resource provider processes the AWS CloudFormation request and 
  returns a response of SUCCESS or FAILED to the pre-signed URL. AWS 
  CloudFormation waits and listens for a response in the pre-signed URL location. 

2.After getting a SUCCESS response, AWS CloudFormation proceeds with the stack 
  operation. If a FAILURE or no response is returned, the operation fails.

Please use cfnresponse module in your lambda function to send the SUCCESS/FAILURE signals back to your Cloudformation

For more details: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-lambda-function-code-cfnresponsemodule.html

1
votes

I finally managed to find a solution to the issue, albeit it is not explaining the strange behavior with the charts that I explained in the question.

My problem was similar to what Abhinaya suggested in her response. The Lambda function was not sending the signal properly because of a programming error. Essentially, I took the code from the documentation (the one for Python 3, second fragment starting by the end) and apparently I mistakenly removed the line for retrieving the ResponseURL. Of course, that was failing.

A side-comment about this: be careful when using Python's cfnresponse library or even the code snippet I linked in the documentation. It relies on botocore.vendored which was deprecated and no longer exist in latest botocore releases. Therefore, it will fail if your code relies on new versions of this library (as in my case). A simple solution is to replace botocore.vendored.requests with the requests library.

Still, there is some strange behavior that I cannot understand. On creation, the Lambda function is not recording anything to CloudWatch and there is this strange behavior in the charts that I explained in my question. However, this only happens on creation. If the function is manually invoked, or is invoked as part of the delete process (when removing the CFN stack), then it does write to CloudWatch. Therefore, the problem only occurs in the first invokation, apparently.

Best.