I've been tasked to look at a service built on AWS Lambda that performs a long-running task of turning VMs on and off. Mind you, I come from the Azure team so I am not familair with the styling or best practices of AWS services.
The approach the original developer has taken is to send the entire workload to one Lambda function and then have that function take a section of the workload and then recursively call itself with the remaining workload until all items are gone (workload = 0).
Pseudo-ish Code:
// Assume this gets sent to a HTTP Lambda endpoint as a whole
let workload = [1, 2, 3, 4, 5, 6, 7, 8]
// The Lambda HTTP endpoint
function Lambda(workload) {
if (!workload.length) {
return "No more work!"
}
const toDo = workload.splice(0, 2) // get first two items
doWork(toDo)
// Then... except it builds a new HTTP request with aws sdk
Lambda(workload) // 3, 4, 5, 6, 7, 8, etc.
}
This seems highly inefficient and unreliable (correct me if I am wrong). There is a lot of state being stored in this process and in my opinion that creates a lot of failure points.
My plan is to suggest we re-engineer the entire service to use a Queue/Worker type framework instead, where ideally the endpoint would handle one workload at a time, and be stateless.
The queue would be populated by a service (Jenkins? Lambda? Manually?), then a second service would read from the queue (and ideally scale-out as well, as needed).