0
votes

I'm quite new with AWS. I'm developing a cloud application and I want to use S3 for file storaging. I read Request Rate and Performance Considerations - Amazon Simple Storage Service to understand better how indexing works in S3 but I've to say it is not so clear to me.

My application is multi-tenant and it will store many files for each customer.

My idea of key is:

bucketname/211a6589-caef-4554-acc6-bc0fd05d756d/a/f/z/3b288ae5-3779-49d1-a79e-1812d4fa76e2.pdf

The key composed in this way:

  1. bucket name
  2. uuid of the tenant
  3. 3 levels of folders randomly generated
  4. uuid of the file

The point 2. is useful to me because I have all data of a tenant into a folder. The uuid is unique and random so should be a good practice. I added also 3 levels of nested folders (randomly generated) because originally I created this method for a disk storage and in this way I can balance inodes on the filesystem.

From the documentation I don't understand exactly what part of the key is used to index in S3.

Is my approach good enough in order to get the best performance from S3?

1

1 Answers

1
votes

Firstly, this type of effort is only required if the bucket "routinely exceeds 100 PUT/LIST/DELETE requests per second or more than 300 GET requests per second".

This is not a typical situation, so don't expend too much effort if your application is unlikely to hit such levels. However, it is a good idea to get it "right" early on, if you think you will hit such levels.

The idea is to spread the load over the entire name space. Think of Amazon S3 as having a tree-structure for maintaining a list of objects. For large buckets, management of the tree-structure is spread across servers. The objective is to spread traffic over multiple servers rather than hitting only one server.

If you are using a UUID to store the an object and the UUID is random, then just use the UUID. That will be sufficient to spread the load across the tree-structure. Even just a few random characters at the front of the key is sufficient to spread the load.

If you particularly want to store files under a "per tenant" structure, then use:

tenant-uuid/object-uuid

This is slightly less good because one tenant could get/put lots of files simultaneously and it would hit a small portion of the tree-structure but that has a low likelihood for a multi-tenant application if you have many simultaneous users.