I subscribed to the free synthetic dataset.
Now I have "Revision ID", "Revision ARN", "Data set ID" and 28 CSV files which I can not download in a pack. I must manually download them one after another or I can export them all to the AWS e3 (I do not want to do that).
Is there a way to download it all in a single archive or somehow automate the process via AWS S3 CLI?
I've tried
./venv/bin/awscliv2 s3 cp s3://arn:aws:dataexchange:us-east-1::data-sets/b0b14e86c092855166507c15e045b844/revisions/6011536d595840f7bd4412fca59e0f6b/assets/7cd4a5cbedb0c5c83e37c20f668b3708 ./
fatal error: Parameter validation failed:
Invalid bucket name "arn:aws:dataexchange:us-east-1::data-sets": Bucket
name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN
matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:]
[a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:
[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]
{1,63}$"
UPD: I've found a python snippet, which uses boto3
and creates a temporary bucket in the process.
UPD 2:: From https://docs.aws.amazon.com/data-exchange/latest/userguide/jobs.html#exporting-assets
There are two ways you can export assets from a published revision of a product:
- To an Amazon S3 bucket that you have permissions to access.
- By using a signed URL.
Therefore, I can't do that stuff without bucket through AWS CLI, but I can use EXPORT_ASSET_TO_SIGNED_URL
UPD 3: I've created a gist for downloading a dataset from AWS DataExchange via signed urls