-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet stack hangup on trying to get the stackframes of a stuck process #4826
Comments
I think you talked about two different kinds of 'stuck':
|
'if the process was in a sufficiently bad state then dotnet-stack might never get a reply"; turns out you are correct; the process in question was deadlocked in GC (which is another discussion thread). At least the ^C not working can be fixed. |
Fixes dotnet#4826 When running dotnet-stack against an unresponsive target process, there were various points where dotnet-stack wouldn't correctly cancel when Ctrl-C was pressed. There were several underlying issues: - Cancellation caused EventPipeSession.Dispose() to run which attempted to send a Stop IPC command that might block indefinitely - Several of the async operations dotnet-stack performed did not pass a cancellation token and so ignore when Ctrl-C is pressed - The calls to start and stop the session were still using the synchronous API which both ignored the cancellation token and create the standard async-over-sync issues. The change in behavior for EventPipeSession.Dispose() is strictly speaking a breaking change, although callers would need to emply some dubious code patterns to observe the difference. The most likely way code could observe the difference is if thread 1 is reading from the EventStream at the same time thread 2 called Dispose(). Previously this would have caused thread 1 to start receiving rundown events although it was also a race condition between thread 1 reading from the stream and thread 2 disposing the stream. Its possible some tool could have worked successfully if thread 1 always won the race in practice. If any code was doing that pattern then now thread 1 will observe the stream is disposed without seeing the rundown events first. The proper way to ensure seeing all the rundown events would be to explicitly call EventPipeSession.Stop(), then read all the remaining data and reach the end of stream marker, then Dispose() the session. I looked through all the usage of EventPipeSession in our existing tools and it looked like all of them were already using Stop() properly.
If I had enough information to file this as a bug report I'd file this as a bug report. It feels very much like a bug; but it might be a bug in the runtime, or something else. Anyway; this behavior is very bad and very unexpected.
Background:
We have this network listener process that's been getting stuck every week or so; the process is on our server and is receiving (encrypted) data from the process on the customer server. Our own internal status check on the stuck process also gets stuck; and the symptoms of the stuck-ness make no sense from an application codebase perspective. (Thankfully this process doesn't use async code so the stacktraces ought to make sense.)
So I said OK, lets get a stack trace next time. We looked up how to do this, found dotnet-stack, copied the standalone binary (this URL https://aka.ms/dotnet-stack/win-x64, a week and a half ago) to the server (it's a server core server), and waited for the next time for our process to get stuck.
So it got stuck, as expected. I than ran
dotnet-stack report --process-id 4860
and it got stuck. In fact it got stuck so badly that ^C didn't get the command prompt back. I tried a second time; runningdotnet-stack report --process-id 4860 > stack.txt
and just leaving it running with the remote desktop window shoved in the background. After waiting for at least 14 minutes; found it it was still stuck; only this time ^C was able to get the command prompt back. As expected, the output file was empty.The target process is an x64 .NET 8 process; working memory was 63MB.
We have a full memory dump of the process; the managed runtime is deadlocked.
Summary:
It's possible for dotnet-stack to get stuck trying to dump stack from a stuck process. This seems like it should not occur.
Environment:
Windows Server Core: probably server core 2022 but might be 2019
Hosting Environment: Azure (Central)
dotnet-stack: win64 standalone binary
target process: .NET 8 winx64 process; shipped as framework included (dotnet publish -r win-x64)
Reproducibility:
At this rate I get one attempt a week.
Stuck-ness does not appear to be data-related. On restarting the process it recovers where it left off, successfully processing the very message it hung up in the middle of.
The text was updated successfully, but these errors were encountered: