0
votes

We have developed two lambda functions in python as below:

  1. Lambda function for RDS write - This function parses .csv files uploaded on S3 and write to AWS Aurora db. File processing logs are dumped in cloudwatch.

  2. Lambda function subscribed to cloudwatch group created from 1st lambda function that writes to RDS - It gets triggered every time new logs are added to RDS write lambda log group.

We are having issue with 2nd lambda function that is subscribed to cloudwatch group. It is parsing cloudwatch logs correctly most of the time but in some cases, we noticed that lambda function gets triggered even before 1st lambda function completes writing all logs to log group. 2nd lambda function gets triggered multiple times for single execution of 1st lambda function and every execution gets part of log group data for parsing.

Above behavior is non consistent and most of time 2nd lambda function gets executed once for every execution of 1st lambda function.

I have below code for collecting log streams

def lambda_handler(event, context):
    print(f'Logging Event: {event}')
    print(f"Awslog: {event['awslogs']}")
    cw_data = event['awslogs']['data']
    print(f'data: {cw_data}')
    print(f'type: {type(cw_data)}')
    compressed_payload = base64.b64decode(cw_data)
    uncompressed_payload = gzip.decompress(compressed_payload)
    payload = json.loads(uncompressed_payload)
    messagelst=[]
    for log_event in payload:
         data_log=json.loads(json.dumps(log_event))
         messagelst.append(re.split(r'\t',data_log['message'])

messagelst collects complete log for parsing and send to parser function. We noticed that parser function sometimes does not get complete log data.

1

1 Answers

0
votes

I believe the issue has to do with the fact that from CloudWatch's perspective, each line in the output is a separate record/event.

According to this question (How does Amazon CloudWatch batch logs when streaming to AWS Lambda?), the current behavior is that your "2nd" lambda will be triggered when PutLogEvents is called (this is also not spelled out in the AWS documentation, so might change or have changed already).

Following the breadcrumbs, the question is how does AWS handle the output of your "1st" lambda (in terms of calling PutLogEvents internally). I could not find a definitive answer. This question (Lambda log and CloudWatch PutLogEvents limit) suggests it might only call it once at the end of the execution, but there is no confirmation of that claim. I suspect the behavior will vary with the amount of output produced and the time it took to produce it.

I simple solution to the issue is to encode your logged data to make sure it is always a single line - there are plenty of ways to do this (like just removing the new-line characters, escaping it, or simply base64 encoding the whole thing).