8
votes

Currently I am using Amazon Web Services(AWS) and to open an S3 bucket, save its contents to a directory in an EC2, I then create a tar file from everything that is in that directory and push that tar file to AWS Glacier. The last step that I was trying to accomplish was to have the script terminate when the tar file has been successfully upload to AWS Glacier (Which takes 3-5 hours).

Currently I am stumped as to how to take the archive_id and ask the vault if the tar file has been successfully loaded.

To interact with AWS Glacier I have been using the python boto tool. I included the python\boto code that uploads the file to glacier and some of the quick tests I have tried to run to just figure out if the code has successfully been uploaded or not. So far all of the tests return false.

I excluded a few tests about the status_code which was also returning false for everything as well and when I try to print out any of these only the not completed and in progress (As expect) prints out anything, yet when I try to match the archive_id or retrieve_job to what is returned in the list of jobs I get no matches. An additional note is the lists that these are saved in when it is printed they all the same ( Job(arn:aws:glacier:us-east-1:232412618534:vaults/glacier-poc) )

How to return true when the job is completed?

    import boto
    import sys

    ACCESS_KEY_ID = "..."
    SECRET_ACCESS_KEY = "..."
    FILENAME = sys.argv[1]
    GLACIER_VAULT_NAME = sys.argv[2]

    connection = boto.connect_glacier(aws_access_key_id=ACCESS_KEY_ID, aws_secret_access_key=SECRET_ACCESS_KEY)

    vault = connection.get_vault(GLACIER_VAULT_NAME)

    archive_id = vault.upload_archive(FILENAME)

    open("glacier.txt", "a").write(FILENAME + " " + archive_id + "\n")

    retrieve_job = vault.retrieve_archive(archive_id)

    a = vault.list_jobs(completed=True)
    b = vault.list_jobs(completed=False)

    print "Is In Completed List"
    print archive_id in a
    print "Is In NOT Completed List"
    print archive_id in b

    print "Is In Completed List"
    print retrieve_job in a
    print "Is In NOT Completed List"
    print retrieve_job in b
1

1 Answers

5
votes

Take a look at this Boto and Glacier guide, you can either poll it manually from boto or you can set up Amazon Simple Notification Service to notify you when the job is done.

archive_id = vault.upload_archive("mybackup.tgz")
retrieve_job = vault.retrieve_archive(archive_id)

# if the job is in progress
job_id = retrieve_job.id
retrieve_job = vault.get_job(job_id)

if retrieve_job.completed:
    job.download_to_file("mybackup.tgz")

You can use boto's set_vault_notifications function set the SNS notifications.

notification_config = {'SNSTopic': 'my_notification_topic',
                       'Events': ['ArchiveRetrievalCompleted',
                                  'InventoryRetrievalCompleted']}
vault.set_vault_notifications(vault, notification_config)

Here is an extensive example of waiting for an upload by setting up SNS notification subscriptions to SQS queue service.