0
votes

I am transferring data from DynamoDB to S3 using a hive script in AWS Data Pipeline. I am using a script like this :

CREATE EXTERNAL TABLE dynamodb_table ( PROPERTIES STRING, EMAIL STRING, ............. ) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMODB_INPUT_TABLE}", "dynamodb.column.mapping" = "PROPERTIES:Properties,EMAIL:EmailId...."); CREATE EXTERNAL TABLE s3_table ( PROPERTIES STRING, EMAIL STRING, ...... )

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY'\n'

LOCATION '${S3_OUTPUT_BUCKET}';

INSERT OVERWRITE TABLE s3_table SELECT * FROM dynamodb_table;

The Properties column in DyanmoDB table is like this

Properties : String :{\"deal\":null,\"MinType\":null,\"discount\":null}

that is it contains multiple attributes in it. I want each attribute in Properties to come as a separate column (not just a string in a single column). I want the output in this schema

deal MinType discount EMAIL

How can I do this?

1

1 Answers

0
votes