8
votes

I am reading the Amazon S3 data consistency from the docs http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

I understand that Amazon S3 provides read after write consistency while putting new objects to S3 bucket. But I didn't quite get that caveat for HEAD and GET request before creating the object, what does that mean?

1
Werner Vogels on Eventual Consistency -- mostly about DynamoDB, but still relevant for any eventually consistent system.John Rotenstein

1 Answers

15
votes

The actual internals of S3 are proprietary to AWS, but here's a theory:

When you request an object, it goes to its cache to see if it's there.

  1. If it is not in the cache, it pulls the data from the underlying storage and puts it in the cache.

This is the read-after-write consistency. You'll get the new version immediately.

  1. If it is already in the cache, it returns the data.

This is the eventual consistency for read-after-write for updates. You update the object, and then the cached version must expire before you get the new version.

  1. If it is not in the cache, and the object is not there, it caches the "not present" result.

This is the behaviour you're asking about. Like the old data being in the cache, S3 has cached "the key does not exist" as the "old data". So again, you must wait for the cache to expire before the actual data can be returned.

Again, this is not stated with any authority. I'd welcome any S3 experts to correct or dispute any errors I may have.