1
votes

The following trigger removes exif data from blobs (which are images) after they are uploaded to azure storage. The problem is that the blob trigger fires at least 5 times for each blob.

In the trigger the blob is updated by writing a new stream of data to it. I had assumed that blob receipts would prevent further firing of the blob trigger against this blob.

[FunctionName("ExifDataPurge")]
public async System.Threading.Tasks.Task RunAsync(
    [BlobTrigger("container/{name}.{extension}", Connection = "")]CloudBlockBlob image,
    string name,
    string extension,
    string blobTrigger,
    ILogger log)
{
    log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name}");

    try
    {
        var memoryStream = new MemoryStream();
        await image.DownloadToStreamAsync(memoryStream);
        memoryStream.Position = 0;
        using (Image largeImage = Image.Load(memoryStream))
        {
            if (largeImage.Metadata.ExifProfile != null)
            {
                //strip the exif data from the image. 
                for (int i = 0; i < largeImage.Metadata.ExifProfile.Values.Count; i++)
                {
                    largeImage.Metadata.ExifProfile.RemoveValue(largeImage.Metadata.ExifProfile.Values[i].Tag);
                }

                var exifStrippedImage = new MemoryStream();
                largeImage.Save(exifStrippedImage, new SixLabors.ImageSharp.Formats.Jpeg.JpegEncoder());
                exifStrippedImage.Position = 0;

                await image.UploadFromStreamAsync(exifStrippedImage);
            }
        }
    }
    catch (UnknownImageFormatException unknownImageFormatException)
    {
        log.LogInformation($"Blob is not a valid Image : {name}.{extension}");
    }
}
3
You are modify your trigger-blob await image.UploadFromStreamAsync(exifStrippedImage); So it's fired againMarkus Meyer
@MarkusMeyer That's my assumption too, but shouldn't the receipt prevent that?Nattrass
Like mentioned in the receipt, you also have to check the ETag. Please have a look here. This might work for you: https://stackguides.com/questions/44784094/cycling-azure-function-blob-trigger @NattrasMarkus Meyer

3 Answers

2
votes

Triggers are handled in such a way that they track which blobs have been processed by storing receipts in container azure-webjobs-hosts. Any blob not having a receipt, or an old receipt (based on blob ETag) will be processed (or reprocessed).

since you are calling await image.UploadFromStreamAsync(exifStrippedImage); it gets triggered (assuming its not been processed)

2
votes

When you call await image.UploadFromStreamAsync(exifStrippedImage);, it will update blob so the blob function will trigger again.

You can try to check the existing CacheControl property on the blob to not update it if it has been updated to break the loop.

// Set the CacheControl property to expire in 1 hour (3600 seconds)
blob.Properties.CacheControl = "max-age=3600";
1
votes

So I've addressed this by storing a Status in metadata against the blob as it's processed.

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-container-properties-metadata

The trigger then contains a guard to check for the metadata.

if (image.Metadata.ContainsKey("Status") && image.Metadata["Status"] == "Processed")
{
    //an subsequent processing for the blob will enter this block. 
    log.LogInformation($"blob: {name} has already been processed");
}
else
{
     //first time triggered for this blob
     image.Metadata.Add("Status", "Processed");
     await image.SetMetadataAsync();
}

The other answers pointed me in the right direction. I think it is more correct to use the metadata. Storing an ETag elsewhere seems redundant when we can store metadata. The use of "CacheControl" seems like too much of a hack, other developers might be confused as to what and why I have done it.