4
votes

I have created a Lambda function which retrieves some data from DynamoDB, and it'll output some JSON. What I'm trying to do is run this function in lambda@edge and generate a response which I can cache using Cloudfront.

The problem I'm facing is that my data in DynamoDB is replicated in (currently) two regions (us-east-2 and eu-west-1) using Global Tables and lambda@edge obviously runs in many regions.

This stops me being able to use AWS_REGION from the lambda environment. For example if a request ran in us-west-1, the environment variable would reflect that, and it'd try to retrieve the data from us-west-1 where it should actually go to us-east-2.

Whilst admittedly I've not tried this (yet) I wondered if perhaps I can set my own latency based routing up in Route53 to point say ddb.mydomain.com at the endpoints for DynamoDB in the regions I use, assuming SAN certs are set up it'd work?

I thought perhaps I could map regions in the code per the example below

const process = { env: { AWS_REGION: 'us-east-1' } };

const regions = {
  'eu-west-1': ['eu-west-1', 'eu-central-1', '...'],
  'us-east-2': ['us-west-1', 'us-east-1', '...'],
};

const activeRegions = Object.keys(regions);

const region = activeRegions.find(
  key => regions[key].includes(process.env.AWS_REGION)
) || activeRegions[0];

console.log(region) // us-east-2

This feels like it'd be more maintenance than it's worth and relies on me making assumptions about the best region to pick. I'd also have to keep my list of regions up to date.

I could use just the first two letters of the region to limit the need to update it when new data centres open up slightly but it's still not ideal

const process = { env: { AWS_REGION: 'ca-central-1' } };

const regions = {
  'eu-west-1': ['eu', 'sa', 'ap', '...'],
  'us-east-2': ['us', 'ca', 'sa', '...'],
};

const activeRegions = Object.keys(regions);

const key = activeRegions.find(
  key => regions[key].includes(
    process.env.AWS_REGION.substring(0, 2) // Just the first 2 letters
  )
) || activeRegions[0];

console.log(key); // us-east-2

I suspect I'm missing something obvious which might allow me to sensibly pick a region in which my data exists from lambda@edge.

Edit

I've since found this, an aws lambda@edge workshop which has been removed which suggests a similar approach to the above. why it was removed i don't know.

function updateDynamoDbClientRegion(request) {  
    let region; 

     // Check if viewer country header is available 
    if (request.headers['cloudfront-viewer-country']) { 
        const countryCode = request.headers['cloudfront-viewer-country'][0].value;  
        region = countryToRegionMapping[countryCode];   
    }   

     // Update DynamoDB client with nearer region   
    if (region) {   
        ddb = ddbUS;    
    }   
}

The readme for said workshop now simply discusses the option of using global tables to reduce latency but offers no insights as to how to pick the closest one which has data.

Edit 2

I've grabbed a copy of the latency data from cloudping and pieced together the following gist which works for now.

https://gist.github.com/benswinburne/06a00fab330dca93ea6df2552f73850a

The downside of this is obviously that the data is stale. cloudping's api isn't nearly quick enough for this purpose unfortunately and as soon as I go to a remote resource to grab up to date data I may as well have just gone to a DynamoDB table in any region ¯\_(ツ)_/¯

1
Did you ever find a good solution for this?Alex Ward
I used the solution in the gists above but I made a library out of it npmjs.com/package/@benswinburne/closest-aws-region. It works fine for my use caseBen Swinburne

1 Answers

0
votes

Regarding your last comment about Global Tables; there is currently no way to reconfigure a table from a particular region to a global table. There are currently two options, depending on whether your tables are replicated (i.e. contain the same data or not). If they contain the same data:

  1. Backup the table using DynamoDB backup
  2. Create a new global table
  3. Restore the table dump into the new global table

If the tables are not replicated, the process would be slightly different:

  1. Export the data from the tables using Data Pipeline
  2. Create a new global table
  3. Import the dumps into the global table using data pipeline

Note that Data Pipeline does not support the new on-demand DynamoDB provisioning. If you were going down this route, you would need to reconfigure the tables to use the old style provisioning whilst you do the export.

I hope this helps. I think your question, by the end, was about moving to a global table, at which point your lambda@edge will just use the nearest table. But I'm not sure whether that's what you needed help with?

EDIT: Just had a look and I now realise this doesn't really solve your problem. You still need to specify a region even with global tables (i.e. which region to read from, even though the data will be auto-replicated). So your question is still, which region to use for the read/write?

EDIT: Just to confirm, are you worried about hitting the wrong DB and missing your data, or getting the closest DB to reduce latency? If the former, the global tables things will work fine for you, as the data will be automatically replicated across regions when you write it to the local DB.