1
votes

I am creating my first AWS Lambda image optimizer script which is working fine on an image at the 'root' of the originating bucket.

I have three buckets. 'mybucket' holds my original existing images I want to resize. I decided to copy them to 'mybucket-photos' which has a python lambda function watching it. The Lammbda function optimizes the incoming photo and saves to 'mybucket-photosresized' .

However, when I try to copy the content of mybucket into mybucket-photos I get failure with regard to the python script file handling and 'subfolder' part of key.

example failure:

No such file or directory: '/tmp/91979758-51b3-44df-b2b1-d9eeddeb0802saddles/thumb/27dfahl/16-5-dk-dressage-saddle-for-sale/saddle_photo02_300.3a38de5F': IOError

My quess is that the folder names with slashes are causing the problem. I understand that the 'folder' is part of the key.

Incidentally, I do not fully understand what the handler method is doing with regard to records, bucket, and key which is making is all the more confusing. My naive instinct is to replace the / somehow and add it back on save.

The python is:

from __future__ import print_function
import boto3
import os
import sys
import uuid
from PIL import Image
import PIL.Image

s3_client = boto3.client('s3')

def resize_image(image_path, resized_path):
    with Image.open(image_path) as image:
        image.save(resized_path,optimize=True)

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'] 
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        upload_path = '/tmp/resized-{}'.format(key)

        s3_client.download_file(bucket, key, download_path)
        resize_image(download_path, upload_path)
        s3_client.upload_file(upload_path, '{}resized'.format(bucket), key)

Whats the best way to handle the 'folders' in this script ?

3

3 Answers

2
votes

You're downloading the object from S3 to a local folder on the machine running your Lambda function. That machine (almost certainly) has a Linux file system. You typically cannot write a file to a folder on such a file system unless that folder already exists. I see no attempt in your code to create the folder that is going to hold the downloaded object.

You've also made this more complex by binding the key of the downloaded object into your download folder name.

So, either simply download to /tmp/ or to /tmp/downloads/ (which you pre-create) or continue to download to a folder that contains the S3 object key name but in that case you must pre-create the relevant folder hierarchy.

1
votes

Like coding on regular platforms, there is no need to use the filesystem unless the object is bigger than the RAM and swap - use in memory objects instead. As you're processing the image, you will have to have enough memory (RAM + Swap) to handle it anyway. This is much faster and less error prone.

PIL's Image.open() takes a file object and boto's s3_client.get_object()['Body'] returns an in-memory file object. Unfortunately, the boto object doesn't support seek() so is unlikely to work. Instead, use Image.fromstring() with the bytes from the response body:

s3_object = s3_client.get_object(Bucket=bucket, Key=key)
file_obj = s3_object['Body']
file_contents = file_obj.read()

pil_image = Image.fromstring(file_contents) 

If you must use the filesystem, use Python's tempfile.TemporaryFile to create temporary files for you.

0
votes

My original code was provided by aws. I think they opted to save files to the file system because you need to save images to change them with methods like optimize, etc.

I decided to stick with the original code and strip out the path and file name with regular expression. It seems to work well enough for my needs. I did notice some issues with a few errors in the background with files that had missing extensions etc.

from __future__ import print_function
import boto3
import os
import sys
import uuid
from PIL import Image
import PIL.Image
import re

s3_client = boto3.client('s3')

def resize_image(image_path, resized_path):
    with Image.open(image_path) as image:
        image.save(resized_path,optimize=True)

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'] 

        file_name_pattern = re.match( r'(.*?([^/]+)/?)$',str(key))
        file_name = file_name_pattern.group(2) 
        path_without_name = file_name_pattern.group(1) 

        download_path = '/tmp/{}{}'.format(uuid.uuid4(), file_name)
        upload_path = '/tmp/resized-{}'.format(file_name)

        s3_client.download_file(bucket, key, download_path)
        resize_image(download_path, upload_path)
        s3_client.upload_file(upload_path, '{}resized'.format(bucket), str(key))