How to access individul elements of a blob in dynamoDb using a hive script?

Question

I am transferring data from DynamoDB to S3 using a hive script in AWS Data Pipeline. I am using a script like this :

CREATE EXTERNAL TABLE dynamodb_table ( PROPERTIES STRING, EMAIL STRING, ............. ) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMODB_INPUT_TABLE}", "dynamodb.column.mapping" = "PROPERTIES:Properties,EMAIL:EmailId...."); CREATE EXTERNAL TABLE s3_table ( PROPERTIES STRING, EMAIL STRING, ...... )

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY'\n'

LOCATION '${S3_OUTPUT_BUCKET}';

INSERT OVERWRITE TABLE s3_table SELECT * FROM dynamodb_table;

The Properties column in DyanmoDB table is like this

Properties : String :{\"deal\":null,\"MinType\":null,\"discount\":null}

that is it contains multiple attributes in it. I want each attribute in Properties to come as a separate column (not just a string in a single column). I want the output in this schema

deal MinType discount EMAIL

How can I do this?

Austin Lee Austin Lee · Accepted Answer · 2016-04-01T23:54:16

Is your Properties column in proper JSON format? If so, it looks like you can - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object

How to access individul elements of a blob in dynamoDb using a hive script?

1 Answers