1
votes

I have a DynamoDB table with some attributes which are Document Types: list and map, to be specific.

I'm trying to export these tables into Redshift. According to the documentation, the Redshift copy command will fail if you try to copy data from a DynamoDB table which includes either list or map attributes:

Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon DynamoDB BINARY and SET data types are not supported. If a COPY command tries to load an attribute with an unsupported data type, the command will fail.

I've also investigated using a Data Pipeline and Hive, but the documentation suggests that Hive doesn't support list or map either.

What are the options for moving data out of a DynamoDB - including list and map attributes - and into Redshift?

1

1 Answers

0
votes

The approach I've taken is to start with the Export DynamoDB Table to S3 template.

The org.apache.hadoop.dynamodb.tools.DynamoDbExport step which runs in the EMRActivity seems to output a manifest file specifically meant for use with the RedshiftCopyActivity: http://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html

So in addition to the S3BackupLocation data node which comes with the "Export DynamoDB Table to S3" template, I've added another S3DataNode which dependsOn the EMR activity. It specifies a manifestFilePath, which is the same as the directoryPath specified in the S3BackupLocation data node - except with /manifest added to the end.

That new data node - with the manifestFilePath - is then used as input for a RedshiftCopyActivity. To flatten attributes which are maps, the RedshiftCopyActivity specifies some commandOptions which provides JSONPath options:

"commandOptions": "JSON 's3://richard.jp.leguen/data-pipeline/misc/jsonpaths.txt'"