Survey
Is it possible to perform a batch upload to Amazon S3?
Yes*.
Does the S3 API support uploading multiple objects in a single HTTP call?
No.
Explanation
Amazon S3 API doesn't support bulk upload, but awscli supports concurrent (parallel) upload. From the client perspective and bandwidth efficiency these options should perform roughly the same way.
────────────────────── time ────────────────────►
1. Serial
------------------
POST /resource
────────────────► POST /resource
payload_1 └───────────────► POST /resource
payload_2 └───────────────►
payload_3
2. Bulk
------------------
POST /bulk
┌────────────┐
│resources: │
│- payload_1 │
│- payload_2 ├──►
│- payload_3 │
└────────────┘
3. Concurrent
------------------
POST /resource
────────────────►
payload_1
POST /resource
────────────────►
payload_2
POST /resource
────────────────►
payload_3
AWS Command Line Interface
Documentation on how can I improve the transfer performance of the sync command for Amazon S3? suggests to increase concurrency in two ways. One of them is this:
To potentially improve performance, you can modify the value of max_concurrent_requests. This value sets the number of requests that can be sent to Amazon S3 at a time. The default value is 10, and you can increase it to a higher value. However, note the following:
- Running more threads consumes more resources on your machine. You must be sure that your machine has enough resources to support the maximum number of concurrent requests that you want.
- Too many concurrent requests can overwhelm a system, which might cause connection timeouts or slow the responsiveness of the system. To avoid timeout issues from the AWS CLI, you can try setting the
--cli-read-timeout value or the --cli-connect-timeout value to 0.
A script setting max_concurrent_requests and uploading a directory can look like this:
aws configure set s3.max_concurrent_requests 64
aws s3 cp local_path_from s3://remote_path_to --recursive
To give a clue about running more threads consumes more resources, I did a small measurement in a container running aws-cli (using procpath) by uploading a directory with ~550 HTML files (~40 MiB in total, average file size ~72 KiB) to S3. The following chart shows CPU usage, RSS and number of threads of the uploading aws process.
