Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Loader][tests] Add more logging to pinpoint BinderTracingTests hang #104982

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Jul 16, 2024

BinderTracingTests has experienced flakey test hangs #104670 #97735 and #94390

The exact cause of the hang has been elusive due to insufficient diagnostics.
Given the console log from the test

 "C:\h\w\B101091F\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" -p "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true"  BinderTracingTest.ResolutionFlow.dll 
[10:04:40 AM] Running AssemblyLoadContextResolving_ReturnNull...
[10:04:42 AM] Running AssemblyLoadContextResolving_LoadAssembly...
[10:04:42 AM] Running AssemblyLoadContextResolving_NameMismatch...
[10:04:42 AM] Running AssemblyLoadContextResolving_MultipleHandlers...
[10:04:42 AM] Running AppDomainAssemblyResolve_ReturnNull...
[10:04:43 AM] Running AppDomainAssemblyResolve_LoadAssembly...
[10:04:43 AM] Running AppDomainAssemblyResolve_NameMismatch...
[10:04:43 AM] Running AppDomainAssemblyResolve_MultipleHandlers...
[10:04:43 AM] Launching process for AssemblyLoadFromResolveHandler_LoadDependency...
[10:04:45 AM] Running AssemblyLoadFromResolveHandler_LoadDependency...

[10:04:47 AM] Launching process for AssemblyLoadFromResolveHandler_NotTracked...
[10:04:48 AM] Running AssemblyLoadFromResolveHandler_NotTracked...

cmdLine:C:\h\w\B101091F\w\A251091E\e\Loader\Loader\../binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.cmd Timed Out (timeout in milliseconds: 600000 from variable __TestTimeout, start: 7/10/2024 10:04:38 AM, end: 7/10/2024 10:14:38 AM)

coupled with the native stack trace from the dump

0:000> !clrstack -f -all
OS Thread Id: 0x23ec
Child SP       IP Call Site
0097E0EC 77B3F3EC ntdll!NtWaitForMultipleObjects + 12
0097E284 72246A85 coreclr!Thread::DoAppropriateAptStateWait + 199
0097E308 72246FB3 coreclr!Thread::DoAppropriateWaitWorker + 998
0097E464 7224D4AD coreclr!`Thread::DoAppropriateWait'::`9'::__Body::Run + 90
0097E4B8 72246B3C coreclr!Thread::DoAppropriateWait + 149
0097E51C 722BEC16 coreclr!WaitHandleNative::CorWaitOneNative + 294
0097E560          [HelperMethodFrame: 0097e560] System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32, Boolean)
0097E5F4 71022A5C System_Private_CoreLib!System.Boolean System.Threading.WaitHandle::WaitOneNoCheck(System.Int32, System.Boolean, System.Object, System.Diagnostics.Tracing.NativeRuntimeEventSource+WaitHandleWaitSourceMap)$##6003CB2 + 188
0097E5F8 71022A5C System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitOneNoCheck(Int32, Boolean, System.Object, WaitHandleWaitSourceMap) + 188
0097E630 71022984 System_Private_CoreLib!System.Boolean System.Threading.WaitHandle::WaitOne(System.Int32)$##6003CB1 + 20
0097E63C 71022984 System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitOne(Int32) + 20
0097E644 094C5B7B system.diagnostics.process.dll!System.Diagnostics.Process.WaitForExitCore(Int32) + 123
0097E67C 094C0800 BinderTracingTest.ResolutionFlow.dll!BinderTracingTests.BinderTracingTest.RunTestInSeparateProcess(System.Reflection.MethodInfo) + 816
0097E6DC 0817237E BinderTracingTest.ResolutionFlow.dll!BinderTracingTests.BinderTracingTest.RunAllTests() + 446
0097E704 0817204F BinderTracingTest.ResolutionFlow.dll!BinderTracingTests.BinderTracingTest.Main(System.String[]) + 39

The most recent flakey test hang occurred while waiting for the test running in an isolated subprocess, and it's not clear whether the subprocess hit a race condition, if the subprocess exited and its signal never made it back to p.WaitForExit(), or something else.

To note, there was only dump collected despite the hang occurring during a test that spins up a subprocess. We would expect the subprocess to have a dump collected as well given that

foreach (var child in FindChildProcessesByName(process, "corerun"))
would create dumps for the main test process and subprocesses.


This PR aims to add slightly more logging to help pinpoint where the test hang occurs, with the expectation that the next time the test suite flakes (if at all), we can have more information to work with.

Copy link
Contributor

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

@mdh1418 mdh1418 force-pushed the binder_tracing_tests_more_diagnostics branch from ef8565c to c291f4a Compare July 17, 2024 01:13
@mdh1418 mdh1418 merged commit 8a7e38e into dotnet:main Jul 17, 2024
65 of 69 checks passed
@mdh1418 mdh1418 deleted the binder_tracing_tests_more_diagnostics branch July 17, 2024 14:08
@github-actions github-actions bot locked and limited conversation to collaborators Aug 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants