2
votes

I upload gzipped files to an Azure Storage Container (input). I then have a WebJob that is supposed to pick up the Blobs, decompress them and drop them into another Container (output). Both containers use the same storage account.

My problem is that it doesn't process all Blobs. It always seems to miss 1. This morning I uploaded 11 blobs to the input Container and only 10 were processed and dumped into the output Container. If I upload 4 then 3 will be processed. The dashboard will show 10 invocations even though 11 blobs have been uploaded. It doesn't look like it gets triggered for the 11th blob. If I only upload 1 it seems to process it.

I am running the website in Standard Mode with Always On set to true.

I have tried:

This is my latest code. Am I doing something wrong?

public class Functions
    {
        public static void Unzip(
            [BlobTrigger("input/{name}.gz")] CloudBlockBlob inputBlob,
            [Blob("output/{name}")] CloudBlockBlob outputBlob)
        {
            using (Stream input = inputBlob.OpenRead())
            {
                using (Stream output = outputBlob.OpenWrite())
                {
                    UnzipData(input, output);
                }
            }
        }

        public static void UnzipData(Stream input, Stream output)
        {
            GZipStream gzippedStream = null;

            gzippedStream = new GZipStream(input, CompressionMode.Decompress);
            gzippedStream.CopyTo(output);            
        }
    }
2
(1) do you see any failed invocations in the dashboard? (2) did you wait a while after uploading the blobs? It might take up to 10 minutes for blobs to be picked upVictor Hurdugaci
(1) I did not see any failed invocations, only the 10 successful ones. (2) I waited for over an hour. The files are fairly large (30 - 40 megs) and processing can take up to 8 seconds. Could this be a problem?pkidza
If I restart the WebJob it picks up the file it missed.pkidza
What version of the SDK are you using?Victor Hurdugaci
1.0.0 installed using Nugetpkidza

2 Answers

3
votes

As per Victor's comment above it looks like it is a bug on Microsoft's end.

Edit: I don't get the downvote. There is a problem and Microsoft are going to fix it. That is the answer to why some of my blobs are ignored...

"There is a known issue about some Storage log events being ignored. Those events are usually generated for large files. We have a fix for it but it is not public yet. Sorry for the inconvenience. – Victor Hurdugaci Jan 9 at 12:23"

0
votes

Just as an workaround, what if you don't directly listen to the Blob instead bring a Queue in-between, when you write to the Input Blob Container also write a message about the new Blob in the Queue also, let the WebJob listen to this Queue, once a message arrived to the Queue , the WebJob Function read the File from the Input Blob Container and copied into the Output Blob Container. Does this model work with you ?