Log requests/responses in API gateway through Labmbda proxy

Question

I would like to log the complete requests+responses incl. body received on API gateway in an AWS Lambda proxy while passing on the requests for processing to a different server (as in reverse proxy the requests). Because the standard logging from API Gateway to CloudWatch truncates requests/responses after 1024 bytes, I cannot use this option. So the processing would look like this:

Request -> API Gateway -> Lambda to log full request incl. body -> public API endpoint -> Response -> Lambda to log full response incl. body -> API Gateway -> Response

Is there a known solution for this scenario ?

Why do you have 3 API Gateway in your scenario? Are these lambdas@edge? — Marcin
no, sorry, I meant to show the message flow. A request comes in on the API gateway and i want to reverse proxy it to a public endpoint. At the same time I need to log the full request/repsonse incl. the body which will exceed the length the API Gateway/CloudWatch logging can handle. That's why I need a lambda to store it somewhere (e.g. in an S3 bucket) — matt478
@matt478 I also have a similar need. Did you solve this? If so, how did you do it? — Jerin A Mathews
Ah, found it. docs.aws.amazon.com/apigateway/latest/developerguide/… — Noel Llevares

David Rissato Cruz David Rissato Cruz · Accepted Answer · 2020-08-17T21:07:00

You probably have a good reason for that, but make sure you aware that logging full request/response bodies may have a lot of undesired implications.

For example, if your service requires GDPR compliance, this is a huge issue. Also, it may greatly affect the performance and make you run into some issues related to quotas. Basically, it is usually not a good idea doing that.

Storing these logs on cloudwatch would be the easiest option for requests were that 1K limit is not an issue. If you have just a bunch of requests where this is not enough, you could consider treating them as an exception.

You could use S3 / DynamoDB / Elastic Search, depending on what you want to do with them, and there are tradeoffs as well.

S3 - This would allow to store very large requests/responses, but it can create a lot of fragmentation. You may end up with a lot of small files and also you need to some sort of index (probably storing the S3 key in cloudwatch logs). Searching can be somewhat painful in this case (although you may be able to use Athena depending on how do you store it).

DynamoDB - Easy to store, but you can run to a lot of quota limits if your API access is too frequent. You may need to bump your costs a lot to prevent them. Also, each record has a limit of 400Kb. I personally don't recommend this approach.

ElasticSearch - Default record size limit is 100Mb but this can be increased. It would make it easy to query this data later on.

I'd say ElasticSearch can be more appropriate for this case given the amount of information this thread has. Also, depending on the volume, some of these solutions would eventually require some publish-subscribe mechanism (ex. Kinesis) in between to handle burst limits and message grouping (if you are S3 for example, you may want to group multiple entries in a single file)

Log requests/responses in API gateway through Labmbda proxy

1 Answers