I have defined an existing table in redshift with certain columns. This is already being successfully mapped to from one S3 bucket using a jsonpaths definition & COPY FROM JSON command. This all works fine.
But I now have differently formatted data which is stored in a different S3 bucket and would like to also put this data into the SAME table in redshift. However, this new data doesn't have values for all of the columns in this redshift table - I would like to simply default those certain values in the redshift table to none for the new bucket data. My problem is I do not know how best to setup the jsonpaths definition for the NEW S3 bucket to use, so that I can also easily run COPY FROM JSON on the new s3 bucket.
I am trying to avoid having to do pre-processing on the new bucket data, which could involve just adding additional keys (to map to the undesired columns) with empty values.
I have tried simply putting empty strings in the jsonpaths to align with the columns in the redshift which I do not care about putting values in for. However, the AWS redshift documentation indicates that empty strings cause an error to arise during the COPY FROM JSON command. https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json-jsonpaths
Existing redshift table
CREATE TABLE events (
platform_name VARCHAR(10),
app_id VARCHAR(10),
app_version VARCHAR(10)
)
Existing map file for existing bucket
{
"jsonpaths": [
"$['device']['platform']['name']",
"$['application']['app_id']",
"$['application']['version_code']"
]
}
Map file for new bucket, where the data objects only have a value corresponding to name
and NOT to app_id
nor app_version
. But I want to put them into the same redshift table.
{
"jsonpaths": [
"$['platform']",
???,
???
]
}
Is one option to just put in locations in the JSON path which I know for certain won't be there in the new format? Would seem odd...