Here are some bullet points in terms of how I have things setup:
I have CSV files uploaded to S3 and a Glue crawler setup to create the table and schema. I have a Glue job setup that writes the data from the Glue table to our Amazon Redshift database using a JDBC connection. The Job also is in charge of mapping the columns and creating the redshift table. By re-running a job, I am getting duplicate rows in redshift (as expected).
However, is there way to replace or delete rows before inserting the new data?
BOOKMARK functionality is Enable but not working.
How can I connect to redshift, delete all data as a part of JOB before pushing data to redshift in Python?