2
votes

I am exploring CloudWatch Logs streams to Firehose. As I understand so far, Cloudwatch Subscription Filter is an event that triggers a lambda to digest the CloudWatch logs and send it to a different destination (ElasticSearch or Firehose or ... another custom lambda). Please correct me if I am wrong.

My concern at the case Cloudwatch Logs Stream to Firehose are:

1/ In term of performance + pricing, is there any difference between :

  • Cloudwatch Subscription Filter -> Firehose
  • Cloudwatch Subscription Filter -> Lambda -> Firehose

2/ Which data format does Firehose received from Cloudwatch?

  • Cloudwatch Subscription Filter -> Firehose : I don't know
  • Cloudwatch Subscription Filter -> Lambda -> Firehose : I think lambda can transform the logs to JSON then put it to Firehose.

Any suggestion is appreciated.

1
Out of curiosity, why do you want to send the logs to a firehose?MyStackRunnethOver
@ MyStackRunnethOver just to build a log analyticsfranco phong
I see. Make sure to take a look at docs.aws.amazon.com/AmazonCloudWatch/latest/logs/…, if you haven't already :)MyStackRunnethOver
It really depends if you plan to collect logs from multiple accounts. Cloudwatch Subscription Filter -> Kinesis -> Firehose -> Lambda , Firehose supports lambda trigger so you don't need Lambda infront the firehose. You definitely need lambda if you try some classification or custom processing.Traycho Ivanov

1 Answers

1
votes
  1. In term of performance + pricing, is there any difference between [having a lambda vs. going straight to firehose?] :

Yes. In terms of performance, you would see slightly greater latency as your data has to pass through a lambda before making it to the firehose, but very likely the increase would not matter at all. You would gain the flexibility of having a customizable processing step fronting your firehose - an opportunity to perform additional transformation or smarter filtering.

NB, you would lose CloudWatch's automatic compression when sending straight to firehose - if you wanted compression, you'd have to set it up yourself (probably on the firehose). Also, you would be paying for lambda invocations to process the intermediate step. Check the pricing page to see whether this actually matters.

  1. Which data format does Firehose receive from Cloudwatch?

When going straight to firehose, you'd get firehose's output configuration (a bunch of records appended together in a file), where each record is a CloudWatch logs output:

{
    "owner": "123456789012",
    "logGroup": "CloudTrail",
    "logStream": "123456789012_CloudTrail_us-east-1",
    "subscriptionFilters": [
        "Destination"
    ],
    "messageType": "DATA_MESSAGE",
    "logEvents": [
        {
            "id": "31953106606966983378809025079804211143289615424298221568",
            "timestamp": 1432826855000,
            "message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"
        },
        ...
    ]
}

(Taken from the SubscriptionFilter docs)

When going to a lambda:

The actual payload that Lambda receives is in the following format { "awslogs": {"data": "BASE64ENCODED_GZIP_COMPRESSED_DATA"} }

(Also from the docs linked above)

Where data is the (encoded, compressed) CloudWatch output object above.

Given that you write the lambda, you can output whatever you want to firehose. Keep in mind that firehose will do that same thing as above, appending multiple records into each output file.


A word of caution: make sure to scale your firehose appropriately - if you're not careful, and you are under-scaled, firehose puts will start failing and you will start dropping log data. Make sure you're monitoring failures!