I am trying to send records of API calls to S3 then only specific fields to redshift. For example, my program sends the following JSON to S3 successfully:
{
"user": "user_name",
"type": "GET",
"date": "2016-03-22 16:14:13",
"data": [{
"EmpID": 1,
"LastName": "Smith",
"FirstName": "Bob",
"DOB": "1910-01-01",
"SSN": "123456789",
"HireDate": "1910-01-01",
"City": "Town",
"Address": "123 Abc"
}]
}
The "data" field may contain one record like the example above or many records if the query returns multiple results. I want to copy only the information from data into the "Employee" table on redshift with the same column names. Copy with json 'auto' does not work because it parses only top level fields. In an attempt to solve this, I created the following JSON paths file:
{
"jsonpaths": [
"$.data[0].EmpID",
"$.data[0].LastName",
"$.data[0].FirstName",
"$.data[0].DOB",
"$.data[0].SSN",
"$.data[0].HireDate",
"$.data[0].City",
"$.data[0].Address",
]
}
Here is an example run of my program:
- Run my AWS Java web project on Tomcat server
- Enter http://localhost:000/projectname/rest/employee/1 (which successfully returns a json object for employee id 1 from my MS SQL server database)
- Creates the json object at the top of this post with metadata about the call and the result of the query
- Writes this object successfully to 's3://bucket/prefix/'
- Tries to run the copy command and fails. When I go to the load tab of my redshift cluster for more information about why the load failed the status of the command is terminated and the SQL field says "COPY ANALYZE employee"
When I try to run the manual version of the copy command below, I get the following error: "ERROR: XX000: Manifest file is not in correct json format"
COPY employee FROM 's3://bucket/prefix/filename' CREDENTIALS 'aws_access_key_id=<>;aws_secret_access_key=<>' JSON 's3://bucket/jsonpaths.json';
In an effort to address the manifest problem, I created the following file:
{
"entries": [
{"url":"s3://bucket/prefix/file", "mandatory":true}
]
}
then ran the copy command:
COPY employee
FROM 's3://bucket/manifest.json'
CREDENTIALS'aws_access_key_id=<>;aws_secret_access_key=<>'
MANIFEST
This yields the stl_load_error: "Delimiter not found" with the raw field value as the full json file saved to s3. If I add the jsonPaths file line from the previous copy command, I get the same manifest error as before.
Questions:
-Is my JSONPaths file in the correct format?
-Is this the best way to extract from lower levels of a JSONArray?
-Is the format of my manifest file correct? Why is a manifest file necessary?
Note: before I added the metadata, I tested the s3 to redshift connection with just the fields I wanted to load and it loaded successfully, so it is unlikely my redshift destination is improperly configured.
Thank you for your help!