10
votes

I am very new to AWS services and has just a week worth of experience with serverless architecture, My requirement is to trigger an event when a new file is uploaded to a specific bucket, once the event trigger is set my Lambda should get the details of the latest file such as Name, Size, and Date of creation.

The source is uploading this file in a new folder every time and names the folder with the current date.

So far I am able to crack how to create my Lambda function and listen to the event trigger.

Here is my code.

import boto3
import botocore
import datetime
import logging

def lambda_handler(event, context):
    logging.info('Start function')
    s3 = boto3.resource('s3')
    DATE = datetime.datetime.today().strftime('%Y-%m-%d')
    BUCKET_NAME = 'monkey-banana-dev'
    KEY = '/banana/incoming/daily/{}'.format(DATE)
    logging.info('Getting file from {}'.format(KEY))
    try:
        s3.Bucket(BUCKET_NAME).download_file(KEY, 'name_of_my_file')
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            print("The object does not exist.")
        else:
            raise

here since I know that it's going to be today's date hence I am using datetime to get the exact KEY but file-name will always be different. although I know that it's going to be a text file with .txt suffix, I am not able to get around on how to get the name of the latest uploaded file and other details from the trigger.

3
You can find sample data from events here: docs.aws.amazon.com/lambda/latest/dg/…T4rk1n
@T4rk1n not finding it much useful, I am very new to AWS.Shek

3 Answers

7
votes

You have an event object, it contains a key "Records" that is a list.

You can filter the records for eventName 'ObjectCreated:Put' and then sort the list by key "eventTime" to get the latest event data.

def lambda_handler(event, context):
    records = [x for x in event.get('Records', []) if x.get('eventName') == 'ObjectCreated:Put']
    sorted_events = sorted(records, key=lambda e: e.get('eventTime'))
    latest_event = sorted_events[-1] if sorted_events else {}
    info = latest_event.get('s3', {})
    file_key = info.get('object', {}).get('key')
    bucket_name = info.get('bucket', {}).get('name')
2
votes

As was mentioned, this link has the info - http://docs.aws.amazon.com/lambda/latest/dg/eventsources.html#eventsources-s3-put

What you need to do is utilize the event object that is passed into the function. That contains the detail that is provided in the link. As you can see in the example in the link, you need to access the key. This will contain the full path, including the date that you mentioned, since the key is the full file path.

To help debug this, you an always print the value of event to the console using the print function in Python.

2
votes

'Key' will contain the whole file path. example-

import boto3
import os
s3 = boto3.resource('s3')
bucket=s3.Bucket('hcss-heavyjob-raw-tables')
for key in bucket.objects.all():
    if key.key.startswith('heavyjob/EMPMAST'):
           print(key.key)'

Output-

heavyjob/EMPMAST/20190524-165352044.csv

heavyjob/EMPMAST/20190529-153011532.csv

heavyjob/EMPMAST/LOAD00000001.csv

You can get the file name by using basename on key.key or

head,tail = os.path.split(key.key)
print(tail)