3
votes

TL;DR

I'm pretty sure now that there is a memory leak in the conhost process in Windows 10 (version 1607) and Windows Server 2016 that hasn't been there in Windows Server 2012 R2 and is fixed in Windows 10 (version 1809+) and Windows Server 2019.

I couldn't find any official reference to that bug, if anyone can point that out for me I'd appreciate it.

Long story

I'm currently investigating a memory-leak in our (self-hosted) ASP.NET Core Application that results in a high memory usage not of the .exe itself but of the corresponding 'conhost' process. That will only happen on all production servers / VMs, running the same application my laptop (or any dev laptop) does not produce a memory leak.

Dev-Machine (running now for > 2h): enter image description here

Production Server (running for ~ 10min): enter image description here

That memory will pile up indefinitely until the server runs out of memory.

That is the code I'm reproducing it with (again: on our production systems, not on dev machines). In our real application it is actually EFCore logging causing the issue (EFCore also uses the ASP.NET Core Loggers):

public class LoggingBackgroundService : BackgroundService
{
    private readonly ILogger<LoggingBackgroundService> logger;

    public LoggingBackgroundService(ILogger<LoggingBackgroundService> logger)
    {
        this.logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        for (var i = 0; i < 10; i++)
        {
            Task.Run(() => RunBackgroundThread(stoppingToken));
        }

        await Task.Delay(Timeout.Infinite, stoppingToken);
    }

    private async Task RunBackgroundThread(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            for (var i = 0; i < 1000; i++)
            {
                logger.LogInformation(GetLogMessage());
                //Console.WriteLine(GetLogMessage());
            }

            await Task.Delay(1000);
        }
    }

    private string GetLogMessage()
    {
        return DateTime.Now.ToString("yyyy-MM-dd hh:mm:ss.ffff");
    }
}

Using Console.WriteLine() instead of the logger works and will not produce a memory leak. And again one more time, that code works fine on a dev machine.

We are using .NET Core 3.1. Can anyone explain what's going on here?

EDIT: My test-system currently sits at ~9 GB RAM, so maybe there is a limit but that's too much RAM spent on a 'conhost' process in any case. enter image description here

I also tried to use DebugDiag to analyze a memory dump, but that shows a stack trace that doesn't really help me (what else to expect though?):

LeakTrack+13277       
ConhostV2!MergeAttrStrings+b8       
ConhostV2!WriteCharsLegacy+a9c       
ConhostV2!DoSrvWriteConsole+4ea       
ConhostV2!ConsoleIoThread+36b       
kernel32!BaseThreadInitThunk+14       
ntdll!RtlUserThreadStart+21 

I only have a basic understanding of VMs but currently it looks like it is only happening in VMWare ESX (?) systems? I no longer think VMWare is causing the issue but was misled by the different Windows 10 Versions / Builds.

  • [Leak] Windows Server 2016 (on VMWare ESX)
  • [Leak] Windows 10, 1607 (on VMWare ESX)
  • [Leak] Windows 10, 1607 (on Azure)
  • [Leak] Windows Server 2016 (on Azure)
  • [Leak] Windows Server 2016 (on AWS)
  • [Leak] Windows 10, 1607 (on Hyper-V)
  • [Leak] Windows Server 2016 (on Hyper-V)
  • [No Leak] Windows 10, 1909 (on VMWare Workstation)
  • [No Leak] Windows Server 2019 (on Azure)
  • [No Leak] Windows 10, 1909 (on Hyper-V)
  • [No Leak] Windows Server 2019 (on Hyper-V)
  • [No Leak] Windows 10, 1809 (on hardware)
1
Are you running it as a Console app or a Win32 service (services have different loggers)? Do your prod machines have any kind of monitoring services or profilers installed, e.g., NewRelic? - Stephen Cleary
“running the same application” – As in, the same binaries, without any change? Is this a self-contained app? If not, is this running on the same runtime? What else is running on your production machine? – “memory will pile up indefinitely” – Are you sure about that? Will it continue to rise or is there some apparent maximum? You’ll probably need to profile this on your production machine, or take a look at a memory dump to see where this is coming from. - poke
Running as a console app, no special monitoring on the prod machines. Yes, same binaries, same issue when running self-contained. I've seen a conhost process consuming 8 GB RAM, so if there is a maximum I haven't found it, the server always died before we reached a maximum. - sky
@sky I think I may be having the same issue (see stackoverflow.com/questions/68228659/…). Did you ever resolve this? - ChaseMedallion

1 Answers

0
votes

I reported this bug to Microsoft and it has since been fixed.

... with the May 2021 Windows Server 2016 Update:

A fix has been included in the May 2021 Windows Server 2016 update (KB5003197) but is hidden behind a feature flag. To activate it

  1. Install KB5003197
  2. Add this registry entry: reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides /v 1402424459 /t REG_DWORD /d 1 /f
  3. Restart the system

... with the June 2021 Windows Server Update:

This fix should be active with the June-Update without using the registry entry but I haven't tested this.