4
votes

I intend to use DynamoDB streams to implement a log trail that tracks changes to a number of tables (and writes this to log files on S3). Whenever a modification is made to a table, a lambda function will be invoked from the stream event. Now, I need to record the user that made the modification. For put and update, I can solve this by including an actual table attribute holding the ID of the caller. Now the record stored in the table will include this ID, which isn't really desirable as it's more meta-data about the operation than part of the record itself, but I can live with that.

So for example:

put({
  TableName: 'fruits',
  Item: {
    id: 7,
    name: 'Apple',
    flavor: 'Delicious',
    __modifiedBy: 'USER_42'
  })

This will result in a lambda function invocation, where I can write something like the following to my S3 log file:

table: 'fruits',
operation: 'put',
time: '2018-12-10T13:35:00Z',
user: 'USER_42',
data: {
    id: 7,
    name: 'Apple',
    flavor: 'Delicious',
}

However, for deletes, a problem arises - how can I log the calling user of the delete operation? Of course I can make two requests, one that updates the __modifiedBy, and another that deletes the item, and the stream would just fetch the __modifiedBy value from the OLD_IMAGE included in the stream event. However, this is really undesirable, having to spend 2 writes on a single delete of an item.

So is there a better way, such as attaching metadata to DynamoDB operations, that are carried over into stream events, without being part of the data written to the table itself?

1

1 Answers

5
votes

Here are 3 different options. The right one will depend on the requirements of your application. It could be that none of these will work in your specific use case, but in general, these approaches will all work.

Option 1

If you’re using AWS IAM at a granular enough level, then you can get the user identity from the Stream Record.

Option 2

If you can handle a small overhead when writing to dynamodb, you could set up a lambda function (or ec2-based service) which acts as a write proxy to your dynamodb tables. Configure your permissions so that only that Lambda can write to the table, and then you can accept any metadata you want and log it however you want. If all you need is logging of events, then you don’t need to write to S3, since AWS can handle Lambda logs for you.

Here’s an example pseudo code for a lambda function using logging instead of writing to S3.

handle_event(operation, item, user)
    log(operation, item, user)
    switch operation
        case put:
             dynamodb.put(item)
        case update:
             dynamodb.update(item)
        case delete:
             dynamodb.delete(item)

log(operation, item, user)
    logEntry.time = now
    logEntry.user = user
    ...
    print(logEntry)

You are, of course, free to still log directly to S3, but if you do, you may find that the added latency is significant enough to impact your application.

Option 3

If you can tolerate some stale data in your table, set up DynamoDB TTL on your table(s). Don’t set a TTL value when creating or updating an item. Then instead of deleting an item, update the item by adding the current time to the TTL field. As far as I can tell, DynamoDB does not use write capacity when removing items with an expired TTL, and expired items are removed with 24 hours of their expiry.

This will allow you to log the “add TTL” as the deletion and have a last modified by user for that deletion. You can safely ignore the actual delete that occurs when dynamodb cleans up the expired items.

In your application, you can also check for the presence of a TTL value so that you don’t present users with deleted data by accident. You could also add a filter expression to any queries that will omit items which have a TTL set.