I am trying to insert a data set in Redshift with values as :
"2015-04-12T00:00:00.000+05:30"
"2015-04-18T00:00:00.000+05:30"
"2015-05-09T00:00:00.000+05:30"
"2015-05-24T00:00:00.000+05:30"
"2015-07-19T00:00:00.000+05:30"
"2015-08-02T00:00:00.000+05:30"
"2015-09-05T00:00:00.000+05:30"
The crawler which I ran over S3 data is unable to identify the columns or datatype of the values. I have been tweaking the table settings to get the job to push the data into Redshift but no avail. Here is what I have tried so far :
- Manually added the column in the table definition in Glue Catalog. There is only 1 column which is mentioned above.
- Changed the Serde serialization lib from LazySimpleSerde to org.apache.hadoop.hive.serde2.lazy.OpenCSVSerDe
- Added the following Serde parameters - quoteChar ", line.delim \n, field.delim \n
- I have already tried different combinations of line.delim and field.delim properties. Including one, omitting another and taking both at the same time as well.
- Changed the classification from UNKONWN to text in table properties.
- Changed the recordCount property to 469 to match the raw data row counts.
The job runs are always successful. After the job runs, when I go to select * from table_name, I always get correct count of rows in the redshift table as per the raw data but all the rows are NULL. How do I populate the rows in Redshift ?
The table properties have been uploaded in image album here : Imgur Album