2
votes

I have configured a life cycle policy on a bucket. Some objects are stored in S3, some are in Glacier. I am now trying to restore all objects from that bucket to local disk.

Using bucket.list() I get the list of objects. Objects that are in Glacier have storage class 'GLACIER' so I can call restore() for those. So far so good..

But.. the class is not an indication of whether the object needs to restored! According to https://docs.aws.amazon.com/AmazonS3/latest/dev/object-archival.html#restore-glacier-objects-concepts objects that are restored back from Glacier to S3 keep the storage class 'GLACIER'. So I can not use that to find out if a restore needs to be done or not. A restored object still has class GLACIER.

My question: how do I find out which objects are in Glacier only, and hence need to be restored() first. Currently in case of storage class 'GLACIER' I try get_contents_to_filename() first, if that fails I call restore().. but that does not feel right.

1

1 Answers

3
votes

When you initiate a restore operation, S3 communicates back the status of that restore operation in the x-amz-restore response header of a HEAD request. Boto will translate the value of that header to the ongoing_restore attribute of the Key object. If the value of that attribute is None it means no restore has been requested or is in progress. If the value is True it means the object is in the process of being restored but the operation has not yet completed. If the value is False it means the restore operation has completed. In that case, the expiry_date attribute of the Key object will be populated with the timestamp for when the restored object will expire and be removed from S3.

You could use something like this to check the status of a restore operation:

import boto

s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket')
for key in bucket:
    if key.storage_class == 'GLACIER':
        full_key = bucket.get_key(key.name)
        if full_key.ongoing_restore is None:
            print('Key %s is not being restored' % key.name)
        elif full_key.ongoing_restore == True:
            print('Key %s restore is in progress' % key.name)
        else:
            print('Key %s has been restored and will expire %s' % (key.name, key.expiry_date))

The get_key call inside the loop is necessary to force a HEAD request on the object to get the metadata. Otherwise you would only have the data returned by the LIST operation which does not include the object metadata.