We have a Azure setup with a Azure Event Grid Topic and to that we have a Azure Function Service with about 15 functions that are subscribing to the topic via different prefix filters. The Azure Function Service is set up as a consumption based resource and should be able to scale as it prefers.
Each subscription is set up to try deliveries for 10 times during maximum 4 hours befor dropping the event. So far so good and the setup is working as expected – most of the time.
In certain, for us unknown situations, it seems like the Event Grid Topic cannot deliver events to the different functions. What we can see is that our dead letter storage fill up with events that have not been delivered.
Now to my question
From the logs we can see the reason for various events not being delivered. The reason is most often Outcome: Probation. We can not find any information from Microsoft on what this actually means.
In addition, the Grid fails and adds the event to the dead letter log before both the timeout policy (4 hours) and the delivery attempts policy (10 retries) has exceeded. Some times the Function Service is idling and do not receive any events from the Grid.
Do any of you good people have ideas of how we can proceed with the troubleshooting for this? What has happened between the Grid and Funciton App when the error message Probation occurs? One thing that we have noticed is that the number of connections from the Grid to our function app is quite high in comparison to the number of events delivered. There are not other incoming connections to the Function App besides the Event Grid.
Example of a dead letter message
[{
"id":"a40a1f02-5ec8-46c3-a349-aea6aaff646f",
"eventTime":"2020-06-02T17:45:09.9710145Z",
"eventType":"mitbalAdded",
"dataVersion":"1",
"metadataVersion":"1",
"topic":"/subscriptions/XXXXXXX/resourceGroups/XXXX_STAGING/providers/Microsoft.EventGrid/topics/XXXXXstaging",
"subject":"odl/type/mitbal/v1",
"deadLetterReason":"TimeToLiveExceeded",
"deliveryAttempts":6,
"lastDeliveryOutcome":"Probation",
"publishTime":"2020-06-02T17:45:10.1869491Z",
"lastDeliveryAttemptTime":"2020-06-02T19:30:10.5756332Z",
"data":"<?xml version=\"1.0\" encoding=\"utf-8\"?><Stock><Action>ADD</Action><Id>123456</Id><Store>123</Store><Shelf>1</Shelf></Stock>"
}]
Function Service Metrics
- Blue = Connections (count)
- Red = Function Executions (count)
- White = Requests (count)
azcommunity[at]microsoft[dot]com
with a link to this thread? – PramodValavala-MSFT