1
votes

I subscribed to the free synthetic dataset.

Now I have "Revision ID", "Revision ARN", "Data set ID" and 28 CSV files which I can not download in a pack. I must manually download them one after another or I can export them all to the AWS e3 (I do not want to do that).

Is there a way to download it all in a single archive or somehow automate the process via AWS S3 CLI?

enter image description here

I've tried

./venv/bin/awscliv2 s3 cp s3://arn:aws:dataexchange:us-east-1::data-sets/b0b14e86c092855166507c15e045b844/revisions/6011536d595840f7bd4412fca59e0f6b/assets/7cd4a5cbedb0c5c83e37c20f668b3708 ./


fatal error: Parameter validation failed:
Invalid bucket name "arn:aws:dataexchange:us-east-1::data-sets": Bucket
name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN
matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:]
[a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:
[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]
{1,63}$"

UPD: I've found a python snippet, which uses boto3 and creates a temporary bucket in the process.

UPD 2:: From https://docs.aws.amazon.com/data-exchange/latest/userguide/jobs.html#exporting-assets

There are two ways you can export assets from a published revision of a product:

  • To an Amazon S3 bucket that you have permissions to access.
  • By using a signed URL.

Therefore, I can't do that stuff without bucket through AWS CLI, but I can use EXPORT_ASSET_TO_SIGNED_URL

UPD 3: I've created a gist for downloading a dataset from AWS DataExchange via signed urls

1

1 Answers

1
votes

I dont know why u using such complicated aws s3 cp. 2 line script can be something like

#syncing all files in folder to local directory ~/csvs/
aws s3 sync s3://bucketname/<folder>/ ~/csvs/   
#you can zip or tar full foder whatever u want 
zip ~/csvs csvs.zip