Im using AWS Glue to copy data from DynamoDB to S3. I have written the below code to copy DyanmoDB table to S3 in the same account. It works fine, copies my table with 600million records without any issues. It take about 20min.
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from datetime import datetime
# inputs
dataset_date = datetime.strftime(datetime.now(), '%Y%m%d')
table_name = "table-name"
read_percentage = "0.5"
output_location = 's3://'+dataset_date
fmt ="json"
# glue setup
sc = SparkContext()
glueContext = GlueContext(sc)
# scan the DDB table
table = glueContext.create_dynamic_frame.from_options("dynamodb",
connection_options={
"dynamodb.input.tableName": table_name,
"dynamodb.throughput.read.percent": read_percentage,
"dynamodb.splits": "100"
}
)
# write to S3
glueContext.write_dynamic_frame.from_options(frame=table,
connection_type="s3",
connection_options={"path": output_location},
format=fmt,
transformation_ctx="datasink"
)
But now I want to do a cross account S3 dump using the above script. The DynamoDB tables are in account A (prod account) and the Glue job to read from DynamoDB tables and S3 bucket to dump that data are in Account B (DW account). I don't know if it is possible to use my script but give cross-account Glue access so it can read DynamoDB tables from Account A