3
votes

There are records that map data type on DynamoDB, I want to export these records to S3 with JSON data format using HiveQL on EMR. How do you do this one? Is it possible?

I read the following documentaion, but that I wanted information was nothing.

I tried the following steps:

  1. Create a table on DynamoDB

    TableName: DynamoDBTable1
    HashKey: user_id
    
  2. Insert two records to DynamoDB

    # record1
    user_id: "0001"
    json: {"key1": "value1", "key2": "value2"}
    
    # record2
    user_id: "0001"
    json: {"key1": "value1", "key2": "value2"}
    
  3. Create a table on EMR from DynamoDB

    CREATE EXTERNAL TABLE test (user_id string, json map<string, string>)
    STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
    TBLPROPERTIES ("dynamodb.table.name" = "DynamoDBTable", 
    "dynamodb.column.mapping" = "user_id:user_id,json:json"); 
    
  4. Export records to S3

    INSERT OVERWRITE DIRECTORY 's3://some-bucket/exports/' select json from test where user_id = '0001';
    
  5. Confirm the S3 bucket, but the exported data is not JSON format...

    # Expected
    [
      {"key1": "value1", "key2": "value2"},
      {"key1": "value1", "key2": "value2"}
    ]
    
    # Actual
    key1^C{"s":"value1"}^Bkey2^C{"s":"value2"}
    key1^C{"s":"value1"}^Bkey2^C{"s":"value2"}
    
1
did you ever figure this out? or find a work around?hiroprotagonist
still no workarounds for this issue?Claudio Bertozzi

1 Answers

0
votes

The following DynamoDB data types are not supported by the DynamoDBStorageHandler class, so they cannot be used with dynamodb.column.mapping:

Map, List, Boolean, Null