1
votes

There are a couple other topics out there, but none with solutions or none pertaining to Python Functions.

Background:

  • EventGrid-triggered, Python Azure Function
  • EventGrid messages created only when a blob is uploaded to a given Storage Account
  • Function receives message, downloads blob from message URL and does "stuff"
  • Function can run for several seconds/minutes (up to 120 seconds for large blobs)

Example of issue:

  • 4 files uploaded to blob container in correct Storage Account
  • Function successfully triggered 4 times, by 4 separate EventGrid messages
  • Function downloads blob from URL in each message, does "stuff"
  • ~55 seconds later, 4 more EventGrid messages trigger the Function all over again (for the same 4 files!)
  • Everything repeats

This happens multiple times resulting in 12 Function executions for 4 files:

  • And corresponding output from the "stuff" the Function does!

enter image description here

It gets ridiculous when 2500 files are uploaded to the Storage Account!

Seems like I need to adjust the EventGrid retry timing. But I don't see a setting for this in the Portal:

enter image description here

How do I prevent this behavior?

EDIT 1: Then today... no issue with 16 files uploaded... why is this Function inconsistently being triggered by EventGrid?

enter image description here

EDIT 2: And again today... for no reason, ~an hour later... EventGrid fired off a bunch more triggers though NO MORE FILES have been uploaded to the storage account.

enter image description here

Here are the EventGrid stats for 16 files being uploaded to storage account.

  • You can clearly see the numbers are all over the place with in some cases, ~1hour between retries.
  • Looks quite arbitrary to me

enter image description here

EDIT 3: For anyone interested...

1

1 Answers

1
votes

Based on the doc, the subscriber (such as your EventGridTrigger function) needs to send a response back to the AEG within the 30 seconds otherwise the message is queued for retry.

Note, that the event is removed from the retry queue when the AEG received within 3 minutes successful respond from the delivery destination endpoint (subscriber). If the deadlettering feature is on, the event is removed from the retry queue also when the respond failure code are 400 or 413.

Based on the above and your long running subscriber, the AEG sent a duplicate event within the 3 minutes.

I do recommend use a Push-Pull pattern in your solution such as delivery an event to the storage queue.