5
votes

I have an application composed of two ASP.NET Core apps, app A and app B. App A makes HTTP calls to App B, and Application Insights automatically correlates this and shows them as a single request. Great!

However, I'm now moving to a more event-based system design, where app A publishes an event to an Azure Event Grid, and app B is set up with a webhook to listen to that event.

Having made that change, the telemetry correlation is broken and it no longer shows up as a single operation.

I have read this documentation: https://docs.microsoft.com/en-us/azure/azure-monitor/app/correlation which explains the theory around correlation headers - but how can I apply this to the Event Grid and get it to forward the correlation headers on to the subscribing endpoints?

2

2 Answers

1
votes

The Header pass-trough idea for a custom topic in the AEG has been recently (Oct.10th) unplanned.

However, the headers can be passed via the AEG model to the subscribers in the data object of the event message. This mediation can be done, for example, using the Policies in Azure API Management.

UPDATE:

The following documents can help for manual instrumentation of the webhook endpoint handler (subscriber side) using a custom tracking operations:

Track custom operations with Application Insights .Net SDK

Application Insights API for custom events and metrics

1
votes
  1. Add two correlation properties to all your events:

    public string OperationId { get; set; }
    public string OperationParentId { get; set; }
    
  2. Publisher side: create Dependency and fill up these properties.

    private Microsoft.ApplicationInsights.TelemetryClient _telemetryClient;
    
    async Task Publish<TEventData>(TEventData data)
    {
        var @event = new EventGridEvent
        {
            Id = Guid.NewGuid().ToString(),
            EventTime = DateTime.UtcNow,
            EventType = typeof(TEventData).FullName,
            Data = data
        };  
    
        string operationName = "Publish " + @event.EventType;
    
        // StartOperation is a helper method that initializes the telemetry item
        // and allows correlation of this operation with its parent and children.
        var operation =
             _telemetryClient.StartOperation<DependencyTelemetry>(operationName);
        operation.Telemetry.Type = "EventGrid";
        operation.Telemetry.Data = operationName;
    
        // Ideally, the correlation properties should go in the request headers but
        // with the current implementation of EventGrid we have no other way 
        // as to store them in the event Data.
        data.OperationId = operation.Telemetry.Context.Operation.Id,
        data.OperationParentId = operation.Telemetry.Id,
    
        try
        {
            AzureOperationResponse result = await _client
                .PublishEventsWithHttpMessagesAsync(_topic, new[] { @event });
            result.Response.EnsureSuccessStatusCode();
    
            operation.Telemetry.Success = true;
        }
        catch (Exception ex)
        {
            operation.Telemetry.Success = false;
            _telemetryClient.TrackException(ex);
            throw;
        }
        finally
        {
            _telemetryClient.StopOperation(operation);
        }
    }
    
  3. Consumer side: create Request and restore correlation.

    [FunctionName(nameof(YourEventDataCosumer))]
    void YourEventDataCosumer([EventGridTrigger] EventGridEvent @event)
    {
        var data = (YourEventData)@event.Data;
    
        var operation = _telemetryClient.StartOperation<RequestTelemetry>(
            "Handle " + @event.EventType,
            data.OperationId,
            data.OperationParentId);
        try
        {
            // Do some event processing.
    
            operation.Telemetry.Success = true;
            operation.Telemetry.ResponseCode = "200";
        }
        catch (Exception)
        {
            operation.Telemetry.Success = false;
            operation.Telemetry.ResponseCode = "500";
            throw;
        }
        finally
        {
            _telemetryClient.StopOperation(operation);
        }
    }
    

This works, but not ideal as you need to repeat this code in every consumer. Also, some early log messages (e.g. emitted by constructors of injected services) are still not correlated correctly.

A better approach would be to create a custom EventGridTriggerAttribute (recreate the whole Microsoft.Azure.WebJobs.Extensions.EventGrid extension) and move this code into IAsyncConverter.ConvertAsync().