I'm developing ETL pipeline using AWS Glue. So I have a csv
file that is transformed in many ways using PySpark, such as duplicate column, change data types, add new columns, etc. I ran a crawler with the data stores in S3 location, so it created Glue Table according to the given csv
file. I mean when I add a new column to the csv
file, it will change the Glue Table accordingly when running the crawler.
Now I want to do the same with Amazon Redshift, what I want to do is create a table in Redshift which is similar to the Glue table I mentioned earlier(created using csv
). A lot of answers explain about to create Redshift schemas manually. I did the same, but when the data type changes I have to manually update it. When csv
file changes Redhsift table must be updated accordingly.
Can I do the same using crawlers? I mean create a Redhsift table that is similar to the Glue Catalog Table? So when data type change or column removed or added in csv
file we can run a crawler, can we do this using crawler, or are there any other method that fulfills my need? This should be a fully automated ELT pipeline.
Any help would be greatly appreciated!