AWS Glue job to pull the data from DynamoDB in another account

Question

Im using AWS Glue to copy data from DynamoDB to S3. I have written the below code to copy DyanmoDB table to S3 in the same account. It works fine, copies my table with 600million records without any issues. It take about 20min.

from pyspark.context import SparkContext
from awsglue.context import GlueContext
from datetime import datetime

# inputs
dataset_date = datetime.strftime(datetime.now(), '%Y%m%d')
table_name = "table-name"
read_percentage = "0.5"
output_location = 's3://'+dataset_date
fmt ="json" 

# glue setup
sc = SparkContext()
glueContext = GlueContext(sc)

# scan the DDB table
table = glueContext.create_dynamic_frame.from_options("dynamodb",
                                                  connection_options={
                                                                      "dynamodb.input.tableName": table_name,
                                                                      "dynamodb.throughput.read.percent": read_percentage,
                                                                      "dynamodb.splits": "100"
                                                                      }
                                                )

# write to S3
glueContext.write_dynamic_frame.from_options(frame=table,
                                         connection_type="s3",
                                         connection_options={"path": output_location},
                                         format=fmt,
                                         transformation_ctx="datasink"
                                        )

But now I want to do a cross account S3 dump using the above script. The DynamoDB tables are in account A (prod account) and the Glue job to read from DynamoDB tables and S3 bucket to dump that data are in Account B (DW account). I don't know if it is possible to use my script but give cross-account Glue access so it can read DynamoDB tables from Account A

Oluwafemi Sule Oluwafemi Sule · Accepted Answer · 2020-03-24T06:11:04

Create an IAM role in Account A (DynamoDB table owner account) that allows for Glue as Principal to read tables.

Configure permissions policy for IAM role in Account A (DynamoDB table owner account) that allowing reading data in tables. A sample you can build from is provided as follow:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListAndDescribe",
            "Effect": "Allow",
            "Action": [
                "dynamodb:List*",
                "dynamodb:DescribeReservedCapacity*",
                "dynamodb:DescribeLimits",
                "dynamodb:DescribeTimeToLive"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllTables",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGet*",
                "dynamodb:DescribeStream",
                "dynamodb:DescribeTable",
                "dynamodb:Get*",
                "dynamodb:Query",
                "dynamodb:Scan"
            ],
            "Resource": [
                "arn:aws:dynamodb:*:*:table/table-1",
                "arn:aws:dynamodb:*:*:table/table-2"
            ]
        }
    ]
}

Configure trust policy in the above IAM role in Account A (Dynamo DB tables account) to permit Glue to assume it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

In IAM role configured for Glue job in Account B (which doesn't own the tables), include a permission policy for it to assume the IAM role in Account A (Dynamo DB tables owner account).

    {
        "Sid": "DelegateDynamoDBTablesOwnerRoleArn",
        "Effect": "Allow",
        "Action": "sts:AssumeRole",
        "Resource": "arn:aws:iam::dynamo-db-table-owner-role-arn:role/*"
    }

References

https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html#cross-account-calling-etl

AWS Glue job to pull the data from DynamoDB in another account

2 Answers

References