0
votes

I'm invoking a Lambda function to send a .csv file from an email to a destination s3 bucket which I specify in said function. The invocation is triggered by Event type: ObjectCreatedByPut when the email is recieved by the incoming s3 bucket.

I see from the CloudWatch logs that the function is indeed invoked as expected, but no file is ever sent to the destination s3 bucket.

Here is the incoming folder showing presence of key mfdat0psudj12qfihankjkiindd17vftd775so01

enter image description here

Here is the Lambda function;

from __future__ import print_function

import json
import urllib
import boto3
import os

import email
import base64

FILE_MIMETYPE = 'text/csv'

# destination folder
S3_OUTPUT_BUCKETNAME = 's3-bucket/attachments/' 

print('Loading function')

s3 = boto3.client('s3')


def lambda_handler(event, context):

    #source email bucket 
    inBucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))


    try:
        response = s3.get_object(Bucket=inBucket, Key=key)
        msg = email.message_from_string(response['Body'].read().decode('utf-8'))   

    except Exception as e:
        print(e)
        print('Error retrieving object {} from source bucket {}. Verify existence and ensure bucket is in same region as function.'.format(key, inBucket))
        raise e
    
    #print(msg)
    
    attachment_list = []
    attachment = msg.get_payload()[1]


    try:
        #scan each part of email 
        for message in msg.get_payload():
            
            # Check filename and email MIME type
            if (message.get_filename() != None and message.content_type() == FILE_MIMETYPE):
                attachment_list.append ({'original_msg_key':key, 'attachment_filename':message.get_filename(), 'body': base64.b64decode(msg.get_payload()) })
    except Exception as e:
        print(e)
        print ('Error processing email for CSV attachments')
        raise e
    
    # if multiple attachments send all to bucket 
    for attachment in attachment_list:

        try:
            s3.put_object(Bucket=S3_OUTPUT_BUCKETNAME, Key=attachment['original_msg_key'] +'-'+attachment['attachment_filename'] , Body=attachment['body']) 
        except Exception as e:
            print(e)
            print ('Error sending object {} to destination bucket {}. Verify existence and ensure bucket is in same region as function.'.format(attachment['attachment_filename'], S3_OUTPUT_BUCKETNAME))
            raise e
            
    print(key)
    print(inBucket)
    print(S3_OUTPUT_BUCKETNAME)
    print(message.get_filename())
    print(response)

    return event

And here are the logs showing successful invocation of the function.

Timestamp
Message
No older events at this moment. 
Retry

2020-10-25T22:05:32.093+00:00   Loading function

2020-10-25T22:05:32.208+00:00   START RequestId: 9d683660-4436-4cff-92c4-01e3ae028a67 Version: $LATEST

2020-10-25T22:05:33.326+00:00   mfdat0psudj12qfihankjkiindd17vftd775so01

2020-10-25T22:05:33.326+00:00   s3-bucket

2020-10-25T22:05:33.326+00:00   s3-bucket/attachments/

2020-10-25T22:05:33.326+00:00   None

2020-10-25T22:05:33.364+00:00   {'ResponseMetadata': {'RequestId': '4DCD1196A2C991B8', 'HostId': 'tKOE8xz3yq1gryGS+7f7u9+fdwU+buK4C/gTTzOZYZheSxXI9a1MxrggIioWttO9mwmCiwG15d0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'tKOE8xz3yq1gryGS+7f7u9+fdwU+buK4C/gTTzOZYZheSxXI9a1MxrggIioWttO9mwmCiwG15d0=', 'x-amz-request-id': '4DCD1196A2C991B8', 'date': 'Sun, 25 Oct 2020 22:05:33 GMT', 'last-modified': 'Sun, 25 Oct 2020 22:05:31 GMT', 'etag': '"b66db710202d45a98daa0a47badf6094"', 'accept-ranges': 'bytes', 'content-type': 'application/octet-stream', 'content-length': '1207346', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2020, 10, 25, 22, 5, 31, tzinfo=tzutc()), 'ContentLength': 1207346, 'ETag': '"b66db710202d45a98daa0a47badf6094"', 'ContentType': 'application/octet-stream', 'Metadata': {}, 'Body': <botocore.response.StreamingBody object at 0x7ffba103f490>}

2020-10-25T22:05:33.366+00:00   END RequestId: 9d683660-4436-4cff-92c4-01e3ae028a67

2020-10-25T22:05:33.366+00:00   REPORT RequestId: 9d683660-4436-4cff-92c4-01e3ae028a67 Duration: 1157.42 ms Billed Duration: 1200 ms Memory Size: 128 MB Max Memory Used: 89 MB Init Duration: 413.53 ms
No newer events at this moment. 
Auto retry paused.
 
Resume

However when I check the s3-bucket/attachments the directory is empty. Interestingly print(message.get_filename()) returns None.

I have checked the Lambda code for errors and have also tried using s3.upload_file but this does not seem to work either.

Not sure where to turn now.

3
Are you confident that your for attachment in attachment_list runs and there are any attachments?Marcin
You don't seem to have nearly enough logging in the conditional arms of your code. Is your code actually calling s3.put_object()?jarmod
@Marcin - Thanks for comment. There most certainly are attacments. I have verified this by examining the raw files.jimiclapton
@jamrod - Thanks, appreciate this is a valid observation and an oversight. I'll look to rectify.jimiclapton
It seems fairly likely that the attachment list is actually empty, as many people have suggested. Basic debugging techniques would have validated this very early.jarmod

3 Answers

2
votes

You’re setting S3_OUTPUT_BUCKETNAME = 's3-bucket/attachments/' and using this as the argument forBucket in the put_object call. This won’t work, as a bucket name cannot contain slashes (/).

The “folder” must be part of the key. In fact, S3 doesn’t have any folders as it’s not a file system. Folders from an actual file system are translated into prefixes as part of the object’s key.

Try changing your function in the following way:

...
S3_OUTPUT_BUCKETNAME = 's3-bucket'
...
s3.put_object(
    Bucket=S3_OUTPUT_BUCKETNAME, 
    Key='attachments/' + attachment['original_msg_key'] + '-' + attachment['attachment_filename'] , 
    Body=attachment['body']
)
...

Having that said, I suspect that attachment_list turns out to be empty and put_object() isn’t even being executed. Please add some logging around the if statement that is supposed to add attachments to the list and the for loop that iterates over the items in attachment_list. There might be a bug hidden there.

1
votes

your print(message.get_filename()) seems to indicate the filename is None and you have a specific case in your loop that says message.get_filename() != None so it looks like there are no valid attachments.

I would add more debugging around what messages are in the payload and then double check your assumptions about what a valid filename, mimetype, etc are to be and if you have the right conditions setup.

0
votes

Having improved my error handling and exploring the methods of the mail library, I was able to confirm that message.content_type() is incorrect and the correct method is infact message.get_content_type(). The implication of using the incorrect method was that the condition for message in msg.get_payload():... was not able to evaluate and output anything, hence the None returned for print(message.get_filename()) and therefore the blank list for attachement_list.

Correct statement below for reference of anyone looking to replicate this functionality with Lambda or elsewhere.

try:
        #scan each part of email 
        for message in msg.get_payload():
            
            # Check filename and email MIME type
            if  (message.get_content_type() == FILE_MIMETYPE and message.get_filename() != None):
                attachment_list.append ({'original_msg_key':key, 'attachment_filename':message.get_filename(), 'body': base64.b64decode(message.get_payload()) })
    except Exception as e:
        print(e)
        print ('Error processing email for CSV attachments')
        raise e

Thanks for all contributions and guidance.