2
votes

I am going to mention my needs and what I have currently in place so bear with me. Firstly, a lambda function say F1 which when invoked will get 100 links from a site. Most of these links say about 95 are the same as when F1 was invoked the previous time, so further processing must be done with only those 5 "new" links. One solution was to write to a Dynamodb database the links that are processed already and each time the F1 is invoked, query the database and skip those links. But I found that the "database read" although in milliseconds is doubling up lambda runtime and this can add up especially if F1 is called frequently and if there are say a million processed links. So I decided to use Elasticache with Redis.

I quickly found that Redis can be accessed only when F1 runs on the same VPC and because F1 needs access to the internet you need NAT. (I don't know much about networking) So I followed the guidelines and set up VPC and NAT and got everything to work. I was delighted with performance improvements, almost reduced the expected lambda cost in half to 30$ per month. But then I found that NAT is not included in the free tier and I have to pay almost 30$ per month just for NAT. This is not ideal for me as this project can be in development for months and I feel like I am paying the same amount as compute just for internet access.

I would like to know if I am making any fundamental mistakes. Am I using the Elasticache in the right way? Is there a better way to access both Redis and the internet? Is there any way to structure my stack differently so that I retain the performance without essentially paying twice the amount after free tier ends. Maybe add another lambda function? I don't have any ideas. Any minute improvements are much appreciated. Thank you.

1
maybe build another pre-F1 to filter out duplicate links first? So you can avoid to add more layers in this solution? - BMW
I am using GoLang scraper called colly which has built-in support for Redis, hence optimal if it's in the same function. I don't mind having another layer as long it's better than the performance I got with DynamoDB. Will try this. - codeyashu
I am also thinking of another potential solution to run a Redis server on EC2 and access it from lambda. - codeyashu

1 Answers

0
votes

There are many ways to accomplish this, and all of them have some trade-offs. A few other ideas for you to consider:

  1. Run F1 without a VPC. It will have connectivity directly to DynamoDB without need for a NAT, saving you the cost of the NAT gateway.

  2. Run your function on a micro EC2 instance rather than in Lambda, and persist your link lookups to some file on local disk, or even a local Redis. With all the Serverless hype, I think people sometimes overestimate the difficulty (and stability) of simply running an OS. It's not that hard to manage, it's easy to set up backups, and may be an option depending upon your availability requirements and other needs.

  3. Save your link data to S3 and set up a VPC endpoint to S3 gateway endpoint. Not sure if it will be fast enough for your needs.