0
votes

I am using AWS S3 as my default file storage system. I have a model with a file field like so:

class Segmentation(models.Model):
    file = models.FileField(...)

I am running image processing jobs on a second server that dump processsed-images to a different AWS S3 bucket. I want to save the processed-image in my Segmentation table.

Currently I am using boto3 to manually download the file to my "local" server (where my django-app lives) and then upload it to the local S3 bucket like so:

from django.core.files import File
import boto3

def save_file(segmentation, foreign_s3_key):
 
    # set foreign bucket
    foreign_bucket = 'foreign-bucket'

    # create a temp file:
    temp_local_file = 'tmp/temp.file'

    # use boto3 to download foreign file locally:
    s3_client = boto3.client('s3')
    s3_client.download_file(foreign_bucket , foreign_s3_key, temp_local_file)
            
    # save file to segmentation:
    segmentation.file = File(open(temp_local_file, 'rb'))
    segmentation.save()
            
    # delete temp file:
    os.remove(temp_local_file)

This works fine but it is resource intensive. I have some jobs that need to process hundreds of images.

Is there a way to copy a file from the foreign bucket to my local bucket and set the segmentation.file field to the copied file?

1
Not sure - can you please provide an example of how to implement this - wouldn't thise use just as much resources? - Daniel
in this case you will save the time to save it in the disk and then reloading it back as the fileobject itself will be get streamed to you. (How the retrieval will happen that needs to be looked upon though). The saving and retrieving it back gets saved by using fileobject. - ranka47
Does this answer your question? stackoverflow.com/questions/44043036/… - ranka47

1 Answers

0
votes

I am assuming you want to move some files from one source bucket to some destination bucket, as the OP header suggests, and do some processing in between.

import boto3 
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource("s3")
video_s3 = my_east_session.resource("s3")
local_bucket = backup_s3.Bucket('localbucket') 
foreign_bucket = video_s3.Bucket('foreignbucket')

for obj in foreign_bucket.objects.all():
    # do some processing
    # on objects
    copy_source = {
        'Bucket': foreign_bucket,
        'Key': obj.key
        }
    local_bucket.copy(copy_source, obj.key)

Session configurations

S3 Resource Copy Or CopyObject depending on your requirement.