- In term of performance + pricing, is there any difference between [having a lambda vs. going straight to firehose?] :
Yes. In terms of performance, you would see slightly greater latency as your data has to pass through a lambda before making it to the firehose, but very likely the increase would not matter at all. You would gain the flexibility of having a customizable processing step fronting your firehose - an opportunity to perform additional transformation or smarter filtering.
NB, you would lose CloudWatch's automatic compression when sending straight to firehose - if you wanted compression, you'd have to set it up yourself (probably on the firehose). Also, you would be paying for lambda invocations to process the intermediate step. Check the pricing page to see whether this actually matters.
- Which data format does Firehose receive from Cloudwatch?
When going straight to firehose, you'd get firehose's output configuration (a bunch of records appended together in a file), where each record is a CloudWatch logs output:
{
"owner": "123456789012",
"logGroup": "CloudTrail",
"logStream": "123456789012_CloudTrail_us-east-1",
"subscriptionFilters": [
"Destination"
],
"messageType": "DATA_MESSAGE",
"logEvents": [
{
"id": "31953106606966983378809025079804211143289615424298221568",
"timestamp": 1432826855000,
"message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"
},
...
]
}
(Taken from the SubscriptionFilter docs)
When going to a lambda:
The actual payload that Lambda receives is in the following format { "awslogs": {"data": "BASE64ENCODED_GZIP_COMPRESSED_DATA"} }
(Also from the docs linked above)
Where data
is the (encoded, compressed) CloudWatch output object above.
Given that you write the lambda, you can output whatever you want to firehose. Keep in mind that firehose will do that same thing as above, appending multiple records into each output file.
A word of caution: make sure to scale your firehose appropriately - if you're not careful, and you are under-scaled, firehose put
s will start failing and you will start dropping log data. Make sure you're monitoring failures!
Cloudwatch Subscription Filter -> Kinesis -> Firehose -> Lambda
, Firehose supports lambda trigger so you don't need Lambda infront the firehose. You definitely need lambda if you try some classification or custom processing. – Traycho Ivanov