37
votes

I'd like to mirror an S3 bucket with Amazon Glacier.

The Glacier FAQ states:

Amazon S3 now provides a new storage option that enables you to utilize Amazon Glacier’s extremely low-cost storage service for data archiving. You can define S3 lifeycycle rules to automatically archive sets of Amazon S3 objects to Amazon Glacier to reduce your storage costs. You can learn more by visiting the Object Lifecycle Management topic in the Amazon S3 Developer Guide.

This is close, but I'd like to mirror. I do not want to delete the content on S3, only copy it to Glacier.

Is this possible to setup automatically with AWS?

Or does this mirroring need be uploaded to Glacier manually?

4
I'd like this feature, too. I don't think it exists now, though.Charles Engelke
What are you trying to accomplish by mirroring S3 to Glacier?Eric Hammond
@EricHammond I'm trying backup my S3 files on Glacier.Justin Tanner
I don't think Glacier is generally an appropriate place to create backup copies of S3 objects (where you keep copies in both places). I explain more in my answer here: stackoverflow.com/questions/15231733/…Eric Hammond
I'd would like that feature also in order to increase the availability of the data stored in S3.VAAA

4 Answers

26
votes

It is now possible to achieve an "S3 to Glacier" mirror by first creating a cross-region replication bucket on Amazon S3 (this replication bucket will be a mirror of your original bucket - see http://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html), then setting up a life-cycle rule (to move the data to Glacier) from within the replication bucket.

4
votes

Amazon doesn't offer this feature through its API. We had the same problem, and solved the problem by running a daily cron job that re-uploads files to Glacier.

Here is a snippet of code you can run using Python and boto to copy a file to a Glacier vault. Note that with the code below, you do have to download the file locally from S3 before you can run it (you can use s3cmd, for instance) - the following code is useful for uploading the local file to Glacier.

import boto

# Set up your AWS key and secret, and vault name
aws_key = "AKIA1234"
aws_secret = "ABC123"
glacierVault = "someName"

# Assumption is that this file has been downloaded from S3
fileName = "localfile.tgz"

try: 
  # Connect to boto
  l = boto.glacier.layer2.Layer2(aws_access_key_id=aws_key, aws_secret_access_key=aws_secret)

  # Get your Glacier vault
  v = l.get_vault(glacierVault)

  # Upload file using concurrent upload (so large files are OK)
  archiveID = v.concurrent_create_archive_from_file(fileName)

  # Append this archiveID to a local file, that way you remember what file
  # in Glacier corresponds to a local file. Glacier has no concept of files.
  open("glacier.txt", "a").write(fileName + " " + archiveID + "\n")
except:
  print "Could not upload gzipped file to Glacier"
3
votes

This is done via Lifecycle policy, but the object is not available in S3 anymore. You can duplicate it into separate bucket to keep it.

0
votes

If you first enable versioning on your S3 bucket then lifecycle rules can be applied to previous versions. This will achieve a very similar outcome, except there won't be a backup of the current version.