I am trying to load data from two different buckets on S3 to Redshift table. In each bucket, there are directories with dates in their names and each of directories contains many files, but there are not manifest.
Example S3 structure:
# Bucket 1
s3://bucket1/20170201/part-01
s3://bucket1/20170201/part-02
s3://bucket1/20170202/part-01
s3://bucket1/20170203/part-00
s3://bucket1/20170203/part-01
# Bucket 2
s3://bucket2/20170201/part-00
s3://bucket2/20170202/part-00
s3://bucket2/20170202/part-01
s3://bucket2/20170203/part-00
Let's say that data from both buckets for 20170201 and 20170202 should be loaded. One of the solutions can be running 4 times COPY command - ones per each bucket-date pair. But I'm curious if it could be done within single COPY call. I've seen that manifest file allows specifying few different files (including from different buckets). However:
- is there option to use prefix instead full path in the manifest,
- and can I use somehow manifest in SQL passing it as a string instead file location - I want to avoid creating temporary files on S3?