26
votes

I would like to copy all the dynamoDB tables to another aws account without s3 to save the data. I saw solutions to copy table with data pipeline but all are using s3 to save the data. I would like to skip s3 step as the table contains a large amount of data so it may take time for s3 write and s3 read process. So I need to directly copy table from one account to another.

6

6 Answers

50
votes

If you don't mind using Python, and add boto3 library (sudo python -m pip install boto3), then I'd do it like this (I assume you know how to fill the keys, regions and table names in code respectively):

import boto3
import os

dynamoclient = boto3.client('dynamodb', region_name='eu-west-1',
    aws_access_key_id='ACCESS_KEY_SOURCE',
    aws_secret_access_key='SECRET_KEY_SOURCE')

dynamotargetclient = boto3.client('dynamodb', region_name='us-west-1',
    aws_access_key_id='ACCESS_KEY_TARGET',
    aws_secret_access_key='SECRET_KEY_TARGET')

dynamopaginator = dynamoclient.get_paginator('scan')
tabname='SOURCE_TABLE_NAME'
targettabname='TARGET_TABLE_NAME'
dynamoresponse = dynamopaginator.paginate(
    TableName=tabname,
    Select='ALL_ATTRIBUTES',
    ReturnConsumedCapacity='NONE',
    ConsistentRead=True
)
for page in dynamoresponse:
    for item in page['Items']:
        dynamotargetclient.put_item(
            TableName=targettabname,
            Item=item
        )
20
votes

Try this nodejs module

npm i copy-dynamodb-table
8
votes

Simple backup and restore for Amazon DynamoDB using boto

https://github.com/bchew/dynamodump

which can do the following:

  • Single table backup/restore
  • Multiple table backup/restore
  • Multiple table backup/restore but between different environments (e.g. production-* tables to development-* tables)
  • Backup all tables and restore only data (will not delete and recreate schema)
  • Dump all table schemas and create the schemas (e.g. creating blank tables in a different AWS account)
  • Backup all tables based on AWS tag key=value
  • Backup all tables based on AWS tag, compress and store in specified S3 bucket.
  • Restore from S3 bucket to specified destination table
3
votes

The reading and writing to S3 is not going to be your bottleneck.

While scanning from Dynamo is going to be very fast, writing the items to the destination table is going to be slow. You can only write up to 1000 items per second per partition. So, I wouldn't worry about the intermediate S3 storage.

However, Data Pipeline is also not the most efficient way of copying a table to another table either.

If you need speedy trasfers then your best bet is to implement your own solution. Provision the destination tables based on your transfer throughput desired (but be careful about undesired partition splits) and then write a parallel scan using multiple threads, which also writes to the destination table.

There is an open source implementation in Java that you can use as a starting point in the AWS labs repository.

https://github.com/awslabs/dynamodb-cross-region-library

1
votes

You can use deepcopy and dynamodb_json:

import boto3
import json
from dynamodb_json import json_util as djson 
from copy import deepcopy
REGION = 'eu-west-1'

# init your sessions to the different accounts (session_a and session_b)   

dynamo_a = session_a.client('dynamodb', region_name=REGION)
dynamo_b = session_b.resource('dynamodb', region_name=REGION)

table = dynamo_b.Table('abc')
result_data = table.scan()
result_item = []
result_item.extend(result_data['Items'])
while 'LastEvaluatedKey' in result_data:
    result_data = my_table.scan(
        FilterExpression=filter_expression,
        ExclusiveStartKey=result_data['LastEvaluatedKey']
    )

    result_item.extend(result_data['Items'])

translated_items = []
for r in result_item:
    updated_item = deepcopy(r)
    translated_items.append(updated_item)

for r in translated_items:
    item = json.loads(djson.dumps(r))
    dynamo_a.put_item(TableName='def', Item=item)
0
votes

The S3 is definitely not a bottleneck. I would almost argue that for 99% of the use-cases you should do it with The Data Pipeline + S3 which is recommended best practice by AWS. I have provided more detailed answer on this here: https://stackoverflow.com/a/57465721/126382

The real question is whether you organize other systems and clients that read/write data live to do migration in a such way that will not cause a downtime. If that is your biggest concern about timing of the task - then you want to engineer the custom solution which will ensure all writes go to both DDB tables in both accounts and switching clients that read data to destination DDB table before you finally switch clients that write data. Couple other flavors of this migration plan are also possible.