Flattening JSON file while transferring from S3 to RedShift using AWS Pipeline

Question

I have json file on S3, I want to transfer it to Redshift. One catch is that the file contains entries in such a format:

{
  "user_id":1,
  "metadata":
            {
              "connection_type":"WIFI",
              "device_id":"1234"
             }
 }

Before I will save it to Redshift I want to flatten the file to contain columns:

user_id | connection_type | device_id

How can I do this using AWS Data Pipeline? Is there activity that can transform json to the desired form? I do not think that transform sql will support json fields.

systemjack systemjack · Accepted Answer · 2016-06-21T21:08:33

You do not need to flatten it. You can load it with the copy command after defining a jsonpaths config file to easily extract the column values from each json object.

With your structure you'd create a file in S3 (s3://bucket/your_jsonpaths.json) like so:

{
    "jsonpaths": [
        "$.user_id",
        "$.metadata.connection_type",
        "$.metadata.device_id"
    ]
}

Then you'd run something like this in Redshift:

copy your_table
from 's3://bucket/data_objects.json'
credentials '<aws-auth-args>'
json 's3://bucket/your_jsonpaths.json';

If you have issues see what is in the stv_load_errors table.

Check out the Redshift copy command and examples.

Flattening JSON file while transferring from S3 to RedShift using AWS Pipeline

1 Answers