0
votes

I'm doing some research and I was reading this page https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

It says

Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. It is simple to increase your read or write performance exponentially. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelise reads, you could scale your read performance to 55,000 read requests per second.

I'm not sure what the last bit means. My understanding is that for the filename 'Australia/NSW/Sydney', the prefix is 'Australia/NSW'. Correct?

How does creating 10 of these improve your read performance? Do you create for example Australia/NSW1/, Australia/NSW2/, Australia/NSW3/, and then map them to a load balancer somehow?

2
This is a possible duplicate of stackoverflow.com/questions/52443839/….ingomueller.net

2 Answers

2
votes

S3 is designed like a Hashtable/HashMap in Java. The prefix form the hash for the hash-bucket... and the actual files are stored in groups in these buckets...

To search a particular file you need to compare all files in a hash-bucket... whereas getting to a hash-bucket is instant (constant-time).

Thus the more descriptive the keys, the more hash-buckets hence lesser items in those buckets... which makes the lookup faster...

Eg. a bucket with tourist attraction details for all countries in the world
Bucket1: placeName.jpg (all files in the bucket no prefix)
Bucket2: countryName/state/placeName.jpg

now if you are looking for Sydney.info in Australia/NSW... the lookup will be faster in second bucket.

1
votes

No, S3 doesn't connect to LB, ever. This article covers this topic but the important highlights:

(...) keys in S3 are partitioned by prefix

(...)

Partitions are split either due to sustained high request rates, or because they contain a large number of keys (which would slow down lookups within the partition). There is overhead in moving keys into newly created partitions, but with request rates low and no special tricks, we can keep performance reasonably high even during partition split operations. This split operation happens dozens of times a day all over S3 and simply goes unnoticed from a user performance perspective. However, when request rates significantly increase on a single partition, partition splits become detrimental to request performance. How, then, do these heavier workloads work over time? Smart naming of the keys themselves!

So Australia/NSW/ could be read from the same partition while Australia/NSW1/ and might Australia/NSW2/ be read from two others. It doesn't have to be that way but still prefixes allow some control of how to partition the data because you have a better understanding of what kind of reads you will be doing on it. You should aim to have reads spread evenly over the prefixes.