I have interesting issue with the latest version of on-premise TFS (2018 Version 16.122.27102.1). I have a release process that includes a step for "Deploy TestAgent on localhost". Looks like this:
Normally work great, worked great when I was using TFS 2012, but recently we upgraded to 2018 and now when this process runs on a certain build agent(Agent-19 only), occasionally I get a strange failure:
Operating system is shutting down for computer 'XXX_TESTING'
The agent: Agent-19 lost communication with the server. Verify the machine is running and has a healthy network connection. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610
Strange, the restart seem to be generated from the same service account as the TFS Build Agent uses:
Not whole lot of information there, the TFS build worker log doesn't have to much information either:
[2018-03-01 00:46:35Z INFO ProcessInvoker] Starting process:
[2018-03-01 00:46:35Z INFO ProcessInvoker] File name: 'C:\TFS Agent\externals\vstshost\LegacyVSTSPowerShellHost.exe'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Arguments: ''
[2018-03-01 00:46:35Z INFO ProcessInvoker] Working directory: 'C:\TFS Agent_work_tasks\DeployVisualStudioTestAgent_52a38a6a-1517-41d7-96cc-73ee0c60d2b6\1.0.42'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Require exit code zero: 'False'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Encoding web name: ; code page: ''
[2018-03-01 00:46:35Z INFO ProcessInvoker] Force kill process on cancellation: 'False'
[2018-03-01 00:46:35Z INFO ProcessInvoker] Process started with process id 14620, waiting for process exit.
[2018-03-01 00:46:35Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[2018-03-01 00:48:11Z INFO Worker] Cancellation/Shutdown message received.
[2018-03-01 00:48:11Z INFO HostContext] Agent will be shutdown for OperatingSystemShutdown
[2018-03-01 00:48:11Z INFO StepsRunner] Cancel current running step.
So, system shuts down, agent stops, tests don't run, but why, no idea... So I re-image the entire server with a copy of one of my other build server, re-install the build agent, but the issue persists, and it only occurs on that build server, only on that step, and only "sometimes" (I haven't identified a pattern, but generally during the nightly run at 6:30PM CST).
How do I diagnose this? Is there a place that would tell me "why" a system restarted? This doesn't really give me a whole lot of information... I searched around and I don't see anyone else with an issue of this nature.


