Root cause analysis of web server performance problems

Question

We have an ASP.NET web application deployed in Azure App Service and using Application Insights for logging and New Relic as a monitoring tool.

Often I am investigating slow response times and what I find the most difficult is identifying the root cause.

In New Relic, I can see all the endpoints got slower:

But there was probably one endpoint which got hit by expensive requests, leading to a CPU utilization spike, manifesting as slow response times for every endpoint.

Sometimes it's pretty clear - one endpoint might get a burst of traffic, so it stands out. But often times it's not about the throughput, it's about what those requests look like.

Are there some established analytical or statistical methods of figuring out the root cause in cases like this? I can imagine it might involve getting a profiler snapshot of the running application, analyzing the web server logs etc.

There are hundreds of thousands of articles and blog posts about troubleshooting ASP.NET applications. Google is your friend. — Ian Kemp
Sure there are. What I'm looking for is a rigorous approach to distinguishing between causes and symptoms. — twoflower

Saif Badran Saif Badran · Accepted Answer · 2021-04-07T16:36:31

In most cases the best approach for performance issues is Profiling, so you can identify which part of your application is spending the most time.

App Service Diagnostics has a built-in Profiler for ASP.NET Web Apps.

In the Portal, navigate to your Web App, open the Diagnose and Solve problems blade from the left menu, and search for Collect .NET Profiler Trace.

You can also Collect Memory Dump and Network Trace which might also help in your case.

See this blog post for details on how to collect and understand the Analysis Report.

Also, since you have Application Insights, you can also run Profiler for your production application from there, which will capture the data automatically at scale without negatively affecting your users.

See: Profile production applications in Azure with Application Insights

Once you identify which layer causing the latency, you can take from there to debug and optimize your code/setup.

Root cause analysis of web server performance problems

1 Answers