AWS S3 Lambda event - get, update, then put JSON file from same bucket

Question

I'm modifying the AWS Lambda with Amazon S3 tutorial found here so that it will simply--

read a JSON file (the content index) already resident in that bucket,
update it with the newly created Key (a new "folder" in that bucket that triggered this Lambda),
and then save (put) the updated JSON file back.

CLARIFICATION: The bucket should only trigger the Lambda function when a folder object is created in it. The content index (index.json) is already resident in the bucket. So, bucket will have folders (e.g., {folder-1, folder-2, folder-n}) and index.json. Every time a new folder is added, it's added to the JSON array.

To be specific, in my case I have a root (destination) bucket that will have a series of folders created by Elemental MediaConvert. Each folder represents a new video; and within each folder are the different formats that may be served to different devices. Back at the root level I have index.json which is an array of these video folders; it is the content index. Now, I could modify the Lambda that is part of the MediaConvert flow.. but I'll consider that at another time. Here, I just want to trigger a new S3 Lambda every time MediaConvert writes a new video folder... which is just some random GUID thing.

I'm learning Node JS and this is the first time I've used some of the structure and calls you see here. (I'm at least aware that this form is cleaner and clearer than using callbacks.)

Since it's going to be tricky (for me!) to test this as a Lambda function, would someone kindly point me at any obvious mistakes??

Also, would someone provide me some direction on how to test this Lambda function manually using Amazon S3 event data (prior to configuring the actual bucket to publish the required event)? I would think I need the event.json to specify the name of that newly created folder so it can be added to my index.json which is also sitting in the same bucket.

Here's my code:

// dependencies
var async = require('async');
var AWS = require('aws-sdk');
var util = require('util');

// constants
//const DEST_FOLDER = 'my-triggering-bucket';
const CONTENT_INDEX_FILENAME = 'index.json';

// get reference to S3 client
var s3 = new AWS.S3();

exports.handler = function(event, context, callback) {
    // Read options from the event.
    // Need the new folder (key) that's been added to the bucket...
    console.log("Reading options from event:\n", util.inspect(event, {depth: 5}));

    // I assume this is the triggering bucket...
    var triggerBucket = event.Records[0].s3.bucket.name;

    // And I assume this is the folder that was added and caused the trigger
    var newKey    = decodeURIComponent(event.Records[0].s3.object.key);

    const indexKey = CONTENT_INDEX_FILENAME;

    // Get the content index and add the newly created dir to it
    async.waterfall([
        function download(next) {
            // Download the content index from S3 into a buffer.
            s3.getObject({
                    Bucket: triggerBucket,
                    Key: indexKey
                },
                next);
        },
        function update(response, next) {
          // Update the content index with the new dir that was added
          console.log('UPDATE...');
          var obj = JSON.parse(response);
          obj.push(newKey);
          var jsonStr = JSON.stringify(obj);
          next(null, jsonStr.ContentType, jsonStr);
        },
        function upload(contentType, data, next) {
            // Stream the updated content index back
            s3.putObject({
                    Bucket: triggerBucket,
                    Key: indexKey,
                    Body: data,
                    ContentType: contentType
                },
                next);
            }
        ], function (err) {
            if (err) {
                console.error('error: ' + err);
            } else {
                console.log('Success);
            }

            callback(null, "message");
        }
    );
};

UPDATE I've abandoned this approach in favor of updating the content index via another means that does not risk runaway execution of my Lambda. I have discovered first-hand that it is not a good idea to try to trigger a specific createObject event in a bucket when one's design does not provide a solid event notification filter. (I could not filter on a suffix of a simple /.) Also, I was expecting a single folder key object create event to trigger my Lambda, but in reality other folders and keys created inside that new root level folder ended up triggering my Lambda too. So, this sent me into the video conversion workflow to modify the Lambda that notifies the successful completion of the workflow such that it updates my content index.

Heads up #1: it will be safer to use two buckets, otherwise if this Lambda function is writing to the same bucket that triggered it, and the new or overwritten object (created by the Lambda function) also triggers the Lambda function, you have an infinite loop of Lambda invocations and bucket PUT actions that can cost real $ if not detected promptly. Ensure, at minimum, that the new object will not trigger inclusion of itself. — Michael - sqlbot
Heads up #2: this code will misplace (occasionally omit) files for inclusion in the index because there's an implicit assumption that the Lambda function won't run concurrently when parallel uploads occur - which it will - and that s3.getObject always gets the most recent copy of the (index) object that has been overwritten - which it may not. That is not guaranteed. An overwrite followed quickly by a read may return the old or the new object. It will always return an intact object, but S3 only guarantees eventual consistency for overwrites. — Michael - sqlbot
Thanks for those heads ups. NO, I hadn't considered either case.. but I like to think that the first one would have dawned on me as I got close! It would be easy enough to prevent because the trigger action would be a new "folder" appearing in the bucket.. not the updated JSON object. I need to think about @athar-kahn 's reply below bc he seems to indicate a new folder create won't trigger my Lambda. As for point 2, at this stage there can be no possibility of concurrency.. but it's def something to keep in mind. Thanks! — motivus

A.Khan A.Khan · Accepted Answer · 2019-01-07T00:11:45

Your Lambda will only be triggered when a new object is created/deleted/modified in the bucket depending on your configuration. If you create a new folder in the bucket it won't trigger your Lambda. You can simplify your code by using async/await syntax in Lambda Node.js 8.1 runtime.

Lambda handler

var AWSS3 = require('aws-sdk/clients/s3');
const CONTENT_INDEX_FILENAME = 'index.json';
var s3 = new AWSS3();

exports.handler = async (event) => {

  try {
    console.log('Event', JSON.stringify(event));

    // Bucket name.
    const triggerBucket = event.Records[0].s3.bucket.name;

    // New key added.
    const newKey = event.Records[0].s3.object.key;

    // Assuming only folder name is to be added in the list. If object 
    // is added in the bucket root then it will be ignored.
    if (newKey.indexOf('/') > -1) {

      // Get existing data.
      let existing = await s3.getObject({
        Bucket: triggerBucket,
        Key: CONTENT_INDEX_FILENAME
      }).promise();

      // Parse JSON object.
      let existingData = JSON.parse(existing.Body);

      // Get the folder name.
      const folderName = newKey.substring(0, newKey.indexOf("/"));

      // Check if we have an array.
      if (!Array.isArray(existingData)) {
        // Create array.
        existingData = [];
      }

      existingData.push(folderName);

      await s3.putObject({
        Bucket: triggerBucket,
        Key: CONTENT_INDEX_FILENAME,
        Body: JSON.stringify(existingData),
        ContentType: 'application/json'
      }).promise();

      console.log(`Added new folder name ${folderName}`);

      return folderName;

     } else {
         console.log('Key was added in bucket root.');
         return 'Ignored';
     }
    };
  }
  catch(err) {
    return err;
  }

Running locally:

Create an event.json file in the root of your project. Add following in the event.json.

{
  "Records":[
    {
      "s3":{
        "bucket":{
        "name": "your_bucket_name"
    },
    "object":{
      "key": "your_folder/your_file.json"
    }
   }
  }
 ]
}

Download lambda local package globally.

npm install -g lambda-local

Finally test it:

Run function locally by passing the event.json file created above.

lambda-local -l path/to/function.js -e event.json

AWS S3 Lambda event - get, update, then put JSON file from same bucket

2 Answers