0
votes

I've tried a few things:

  • SNS Topic as a failure destination for the lambda (this doesn't work because the Lambda's are triggered via synchronous events like SQS, SNS, or CloudWatch scheduled event).
  • CloudWatch alarms (these seem to be really more for aggregating data, not triggering alerts for single errors)
  • Creating a Lambda that subscribes to a CloudWatch log group (it seems I'd have to create one of these for every single lambda function)

What's the preferred way of monitoring a slew of lambdas for failed invocations? Ideally, the 1st method above would work and I could pipe a handful of these to an SNS topic. If any errors are encountered I'd be notified with a summary and can investigate further. I'm certain I'm missing something.

1
Woudn't step functions be good fit to this scenario? SF triggers your lambdas, and if error is detected, SF makes SNS notification?Marcin
SF would work. I was looking for a solution that would plug into a bunch of existing lambdas – small one-off cron jobs as well as larger (or even SF-invoked) functions – the way I use LogDNA or LogEntries to send notifications for any errors/exceptions encountered in any logs. I realized that I could easily add all relevant log groups as triggers for a generic "CloudWatch processor lambda" for a given pattern, and it's just what I was looking for. I'll post a detailed answer.Charlie Schliesser

1 Answers

0
votes

My hangup was that I wanted to find a single method of capturing any ERROR pattern match in all lambda-related CloudWatch logs, and get immediate notification of it. The kind of thing you'd setup in LogDNA or LogEntries really simply through their UI. I didn't like the idea of setting up individual lambdas to subscribe to other lambdas' logs because I thought that'd be overkill. I was overthinking it.

The best way to do this is to create a lambda that is triggered by CloudWatch Logs and deliberately subscribes to the log groups and patterns that it cares about. Here's a generic example:

enter image description here

It's easy to do this in a CloudFormation template (or whatever format), e.g.:

Resources:
  CloudWatchLogProcessorFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: CloudWatchLogProcessor
      ...
      Events:
        Foo:
          Type: CloudWatchLogs
          Properties:
            LogGroupName: /aws/lambda/Foo
            FilterPattern: ERROR
        Bar:
          Type: CloudWatchLogs
          Properties:
            LogGroupName: /aws/lambda/Bar
            FilterPattern: ERROR
        Baz:
          Type: CloudWatchLogs
          Properties:
            LogGroupName: /aws/lambda/Baz
            FilterPattern: ERROR
        Bof:
          Type: CloudWatchLogs
          Properties:
            LogGroupName: /aws/lambda/Bof
            FilterPattern: ERROR

You can then have this lambda perform the tiny task of decoding the log and doing whatever you want with the results, like publishing a message containing the log group and log entries to an SNS topic:

const AWS = require('aws-sdk');
const zlib = require('zlib');

const sns = new AWS.SNS();

exports.handler = async (event) => {

    if (!event.awslogs || !event.awslogs.data) {
        throw new Error("Unexpected event.");
    }

    const payload = Buffer.from(event.awslogs.data, 'base64');

    const log = JSON.parse(zlib.unzipSync(payload).toString());

    let msg = '';

    msg += `Log Group: ${log.logGroup}\n`;
    msg += `Log Stream: ${log.logStream}\n\n`;

    log.logEvents.forEach(e => {
        msg += `${e.message}\n\n`;
    });

    await sns.publish({
        Message: msg,
        TopicArn: process.env.SNS_TOPIC_ARN
    }).promise();

    return null;
};