3
votes

I have 1000s of large (5 - 500Mb, most are ~100Mb) files in an S3 bucket, no organisation at all - no "directories". These files all have different expiration times (some expire after 60 days, others after 90, etc.) after which I would like to move them to the Glacier storage class.

I have looked at the Life Cycle feature, but I cannot find how to apply a specific rule to one file. They appear to only work by using prefixes and I would rather not change my naming convention.

I have tried - using the PHP SDK - to do a copyObject with the 'StorageClass' argument set to "GLACIER", but that predictably gave an exception. I guess the documentation is up to date and there really is no such value :-)

I really hope I'm missing something, because I would hate to have to download these files and then upload them to Glacier 'manually'. I'd also be missing the easy restore features from the S3 console.

2

2 Answers

4
votes

There is no command to tell Amazon S3 to archive a specific object to Amazon Glacier. Instead, Lifecycle Rules are used to identify objects.

The Lifecycle Configuration Elements documentation shows each rule consisting of:

  • Rule metadata that include a rule ID, and status indicating whether the rule is enabled or disabled. If a rule is disabled, Amazon S3 will not perform any actions specified in the rule.
  • Prefix identifying objects by the key prefix to which the rule applies.
  • One or more transition/expiration actions with a date or a time period in the object's lifetime when you want Amazon S3 to perform the specified action.

The only way to identify which objects are transitioned is via the prefix parameter. Therefore, you would need to specify a separate rule for each object. (The prefix can include the full object name.)

However, there is a limit of 1000 rules per lifecycle configuration.

Yes, you could move objects one-at-a-time to Amazon Glacier, but this would actually involve uploading archives to Glacier rather than 'moving' them from S3. Also, be careful -- there are higher 'per request' charges for Glacier than S3 that might actually cost you more than the savings you'll gain in storage costs.

In the meantime, consider using Amazon S3 Standard - Infrequent Access storage class, which can save around 50% of S3 storage costs for infrequently-accessed data.

0
votes

You can programmatically archive a specific object on S3 to Glacier using Lifecycle Rules (with a prefix of the exact object you wish to archive).

There is a PUT lifecycle API. This API replaces the entire lifecycle configuration, so if you have rules outside of this process you will need to add them to each lifecycle you upload. If you want to archive specific files, you can either:

  • For each file, create a lifecycle with one rule, wait until the file has transferred, then do the same for the next file
  • Create a lifecycle configuration with one rule per file

The second will finish faster (as you do not need to wait between files), but requires that you know all the files you want to archive in advance.

There is a limit of 1,000 rules per lifecycle configuration, so if you have an awful lot of files that you want to archive, you will need to split them into separate lifecycle configurations.