Creating prefixes in S3 to paralellise reads and increase performance

Question

I'm doing some research and I was reading this page https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

It says

Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. It is simple to increase your read or write performance exponentially. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelise reads, you could scale your read performance to 55,000 read requests per second.

I'm not sure what the last bit means. My understanding is that for the filename 'Australia/NSW/Sydney', the prefix is 'Australia/NSW'. Correct?

How does creating 10 of these improve your read performance? Do you create for example Australia/NSW1/, Australia/NSW2/, Australia/NSW3/, and then map them to a load balancer somehow?

This is a possible duplicate of stackoverflow.com/questions/52443839/…. — ingomueller.net

deependrax deependrax · Accepted Answer · 2018-11-22T13:37:23

S3 is designed like a Hashtable/HashMap in Java. The prefix form the hash for the hash-bucket... and the actual files are stored in groups in these buckets...

To search a particular file you need to compare all files in a hash-bucket... whereas getting to a hash-bucket is instant (constant-time).

Thus the more descriptive the keys, the more hash-buckets hence lesser items in those buckets... which makes the lookup faster...

Eg. a bucket with tourist attraction details for all countries in the world
Bucket1: placeName.jpg (all files in the bucket no prefix)
Bucket2: countryName/state/placeName.jpg

now if you are looking for Sydney.info in Australia/NSW... the lookup will be faster in second bucket.

Creating prefixes in S3 to paralellise reads and increase performance

2 Answers