I have created a Lambda function which retrieves some data from DynamoDB, and it'll output some JSON. What I'm trying to do is run this function in lambda@edge and generate a response which I can cache using Cloudfront.
The problem I'm facing is that my data in DynamoDB is replicated in (currently) two regions (us-east-2 and eu-west-1) using Global Tables and lambda@edge obviously runs in many regions.
This stops me being able to use AWS_REGION
from the lambda environment. For example if a request ran in us-west-1, the environment variable would reflect that, and it'd try to retrieve the data from us-west-1 where it should actually go to us-east-2.
Whilst admittedly I've not tried this (yet) I wondered if perhaps I can set my own latency based routing up in Route53 to point say ddb.mydomain.com at the endpoints for DynamoDB in the regions I use, assuming SAN certs are set up it'd work?
I thought perhaps I could map regions in the code per the example below
const process = { env: { AWS_REGION: 'us-east-1' } };
const regions = {
'eu-west-1': ['eu-west-1', 'eu-central-1', '...'],
'us-east-2': ['us-west-1', 'us-east-1', '...'],
};
const activeRegions = Object.keys(regions);
const region = activeRegions.find(
key => regions[key].includes(process.env.AWS_REGION)
) || activeRegions[0];
console.log(region) // us-east-2
This feels like it'd be more maintenance than it's worth and relies on me making assumptions about the best region to pick. I'd also have to keep my list of regions up to date.
I could use just the first two letters of the region to limit the need to update it when new data centres open up slightly but it's still not ideal
const process = { env: { AWS_REGION: 'ca-central-1' } };
const regions = {
'eu-west-1': ['eu', 'sa', 'ap', '...'],
'us-east-2': ['us', 'ca', 'sa', '...'],
};
const activeRegions = Object.keys(regions);
const key = activeRegions.find(
key => regions[key].includes(
process.env.AWS_REGION.substring(0, 2) // Just the first 2 letters
)
) || activeRegions[0];
console.log(key); // us-east-2
I suspect I'm missing something obvious which might allow me to sensibly pick a region in which my data exists from lambda@edge.
Edit
I've since found this, an aws lambda@edge workshop which has been removed which suggests a similar approach to the above. why it was removed i don't know.
function updateDynamoDbClientRegion(request) {
let region;
// Check if viewer country header is available
if (request.headers['cloudfront-viewer-country']) {
const countryCode = request.headers['cloudfront-viewer-country'][0].value;
region = countryToRegionMapping[countryCode];
}
// Update DynamoDB client with nearer region
if (region) {
ddb = ddbUS;
}
}
The readme for said workshop now simply discusses the option of using global tables to reduce latency but offers no insights as to how to pick the closest one which has data.
Edit 2
I've grabbed a copy of the latency data from cloudping and pieced together the following gist which works for now.
https://gist.github.com/benswinburne/06a00fab330dca93ea6df2552f73850a
The downside of this is obviously that the data is stale. cloudping's api isn't nearly quick enough for this purpose unfortunately and as soon as I go to a remote resource to grab up to date data I may as well have just gone to a DynamoDB table in any region ¯\_(ツ)_/¯