There are records that map data type on DynamoDB, I want to export these records to S3 with JSON data format using HiveQL on EMR. How do you do this one? Is it possible?
I read the following documentaion, but that I wanted information was nothing.
- DynamoDB DataFormat Documentation: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataFormat.html
- Hive Command Examples for Exporting... Documentation: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html
I tried the following steps:
Create a table on DynamoDB
TableName: DynamoDBTable1 HashKey: user_id
Insert two records to DynamoDB
# record1 user_id: "0001" json: {"key1": "value1", "key2": "value2"} # record2 user_id: "0001" json: {"key1": "value1", "key2": "value2"}
Create a table on EMR from DynamoDB
CREATE EXTERNAL TABLE test (user_id string, json map<string, string>) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "DynamoDBTable", "dynamodb.column.mapping" = "user_id:user_id,json:json");
Export records to S3
INSERT OVERWRITE DIRECTORY 's3://some-bucket/exports/' select json from test where user_id = '0001';
Confirm the S3 bucket, but the exported data is not JSON format...
# Expected [ {"key1": "value1", "key2": "value2"}, {"key1": "value1", "key2": "value2"} ] # Actual key1^C{"s":"value1"}^Bkey2^C{"s":"value2"} key1^C{"s":"value1"}^Bkey2^C{"s":"value2"}