3
votes

We are looking for a solution which uses minimum read/write units of DynamoDB table for performing full backup, incremental backup and restore operations. Backup should store in AWS S3 (open to other alternatives). We have thought of few options such as:

1) Using python multiprocessing and boto modules we were able to perform Full backup and Restore operations, it is performing well, but is taking more DynamoDB read/write Units.

2) Using AWS Data Pipeline service, we were able to perform Full backup and Restore operations.

3) Using Dynamo Streams and kinesis Adapter/ Dynamo Streams and Lambda function, we were able to perform Incremental backup.

Are there other alternatives for Full backup, Incremental backup and Restore operations. The main limitation/need is to have a scalable solution by utilizing minimal read/write units of DynamoDb table.

2

2 Answers

1
votes

Option #1 and #2 are almost the same- both do a Scan operation on the DynamoDB table, thereby consuming maximum no. of RCUs.

Option #3 will save RCUs, but restoring becomes a challenge. If a record is updated more than once, you'll have multiple copies of it in the S3 backup because the record update will appear twice in the DynamoDB stream. So, while restoring you need to pick the latest record. You also need to handle deleted records correctly.

You should choose option #3 if the frequency of restoring is less, in which case you can run an EMR job over the incremental backups when needed. Otherwise, you should choose #1 or #2.

0
votes

On-Demand backups are a feature built into the DynamoDB service (Accessible via the API, AWS Management Console and CLI as usual), which allows you to take a full backup of a table at a point in time.

This task has no impact on performance or availability to your tables. All backups are automatically encrypted, cataloged, easily discoverable, and retained until you explicitly delete them.

Additionally, you can restore these backups to a new table at any point.

Along with data, the following is included in the backups:

Global secondary indexes (GSIs)
Local secondary indexes (LSIs)
Streams
Provisioned read and write capacity

The following is NOT included in the backups:

Auto scaling policies
AWS Identity and Access Management (IAM) policies
Amazon CloudWatch metrics and alarms
Tags
Stream settings
Time To Live (TTL) settings

I've blogged more information and a walkthrough here: https://www.abhayachauhan.com/2017/12/dynamodb-scheduling-on-demand-backups/