My current use case is, in an ETL based service (NOTE
: The ETL service is not using the Glue ETL, it is an independent service), I am getting some data from AWS Redshift clusters into the S3. The data in S3 is then fed into the T and L jobs. I want to populate the metadata into the Glue Catalog. The most basic solution for this is to use the Glue Crawler, but the crawler runs for approximately 1 hour and 20 mins(lot of s3 partitions). The other solution that I came across is to use Glue API's. However, I am facing the issue of data type definition in the same.
Is there any way, I can create/update the Glue Catalog Tables where I have data in S3 and the data types are known only during the extraction process.
But also, when the T and L jobs are being run, the data types should be readily available in the catalog.