0
votes

I am trying to insert a data set in Redshift with values as :

"2015-04-12T00:00:00.000+05:30"
"2015-04-18T00:00:00.000+05:30"
"2015-05-09T00:00:00.000+05:30"
"2015-05-24T00:00:00.000+05:30"
"2015-07-19T00:00:00.000+05:30"
"2015-08-02T00:00:00.000+05:30"
"2015-09-05T00:00:00.000+05:30"

The crawler which I ran over S3 data is unable to identify the columns or datatype of the values. I have been tweaking the table settings to get the job to push the data into Redshift but no avail. Here is what I have tried so far :

  1. Manually added the column in the table definition in Glue Catalog. There is only 1 column which is mentioned above.
  2. Changed the Serde serialization lib from LazySimpleSerde to org.apache.hadoop.hive.serde2.lazy.OpenCSVSerDe
  3. Added the following Serde parameters - quoteChar ", line.delim \n, field.delim \n
  4. I have already tried different combinations of line.delim and field.delim properties. Including one, omitting another and taking both at the same time as well.
  5. Changed the classification from UNKONWN to text in table properties.
  6. Changed the recordCount property to 469 to match the raw data row counts.

The job runs are always successful. After the job runs, when I go to select * from table_name, I always get correct count of rows in the redshift table as per the raw data but all the rows are NULL. How do I populate the rows in Redshift ?

The table properties have been uploaded in image album here : Imgur Album

1
I did a workaround with COPY command but I am still looking for the settings that needed to be configured to achieve the same results using AWS Glue.Rishabh Dixit

1 Answers

0
votes

I was unable to push the data into Redshift using Glue. So I turned to COPY command of Redshift. Here is the command that I executed in case anyone else needs it or faces the same situation :

copy schema_Name.Table_Name
from 's3://Path/To/S3/Data'
iam_role 'arn:aws:iam::Redshift_Role'
FIXEDWIDTH 'Column_Name:31'
region 'us-east-1';