3
votes

I'm processing an XML file added to S3 and writing the results to a firehose, and storing the results on the same S3 bucket, but the destination filename has to be in a specific format. I've examing the documentation and I can't see any way to set the format of the filename. The closest I can find is in the firehose FAQ

Q: What is the naming pattern of the Amazon S3 objects delivered by Amazon Kinesis Data Firehose?

The Amazon S3 object name follows the pattern DeliveryStreamName-DeliveryStreamVersion-YYYY-MM-DD-HH-MM-SS-RandomString, where DeliveryStreamVersion begins with 1 and increases by 1 for every configuration change of the delivery stream. You can change delivery stream configurations (for example, the name of the S3 bucket, buffering hints, compression, and encryption) with the Firehose Console or the UpdateDestination operation.

2
This was solved with a second lambda that's triggered when the firehose is written to, which writes the text from the firehose to a new file on S3Joseph McCarthy

2 Answers

4
votes

If you are using static naming, you can specify it through Firehose Console or the UpdateDestination operation.

But if you are looking for some dynamic naming, unfortunately, currently it is not possible. Refer to this question for detail answering - Storing Firehose transfered files in S3 under custom directory names

2
votes

I too wasn't happy with that I couldn't specify the name of my files dynamically, so I made a lambda function to rename the files that my Kinesis stream outputs. These were the steps I took

  • I included the filename I wanted in my Kinesis data.
  • I created a new lambda function, set up to run whenever kinesis outputs a file.
  • My lambda function:
    1. opens my file
    2. grabs the new file name
    3. creates the new file
    4. deletes the badly named old file.
import boto3
import json


def lambda_handler(event, context):
    key = event["Records"][0]["s3"]["object"]["key"]
    bucket=event["Records"][0]["s3"]["bucket"]["name"]
    s3resource = boto3.resource('s3')
    obj = s3resource.Object(bucket, key)
    body = obj.get()['Body'].read()
    dic = json.loads(body)
    my_new_file_name= dic["my_new_file_name"]
    s3resource.Object(bucket, str(my_new_file_name).copy_from(CopySource=f'{bucket}/{key}')
    s3resource.Object(bucket, key).delete()