Break into the debugger (if attached) on panics (Windows, Linux, macOS, FreeBSD) #129019

kromych · 2024-08-12T17:54:09Z

The developer experience for panics is to provide the backtrace and
exit the program. When running under debugger, that might be improved
by breaking into the debugger once the code panics thus enabling
the developer to examine the program state at the exact time when
the code panicked.

Let the developer catch the panic in the debugger if it is attached.
If the debugger is not attached, nothing changes. Providing this feature
inside the standard library facilitates better debugging experience.

Validated under Windows, Linux, macOS 14.6, and FreeBSD 13.3..14.1.

rustbot · 2024-08-12T17:54:16Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @workingjubilee (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

kromych · 2024-08-12T17:56:19Z

r? @wesleywiser

library/std/src/panicking.rs

workingjubilee · 2024-08-12T18:49:03Z

@kromych Please do not both add public API to the standard library and change the internal behavior of unrelated code in the same PR.

workingjubilee · 2024-08-12T18:50:09Z

This is a library change, not a compiler one.

r? @workingjubilee

workingjubilee

I strongly recommend you run check builds locally.

library/std/src/os/mod.rs

library/std/src/os/windows/dbg.rs

kromych · 2024-08-12T18:59:11Z

@kromych Please do not both add public API to the standard library and change the internal behavior of unrelated code in the same PR.

Thanks for your help! Should I open two PR instead (adding the dbg_breakpoint API and using it for the easier panic debugging) and close this one?

workingjubilee · 2024-08-12T19:01:06Z

Why did you ask for wesleywiser's review?

kromych · 2024-08-12T19:14:11Z

Why did you ask for wesleywiser's review?

The change relies on SEH which is handled normally by the compiler, thought there might be some assumption about setting the stack frame up for SEH. This is my first attempt at contribution to the Rust repo, haven't known about the proper process, learning as I go.

workingjubilee · 2024-08-12T19:20:32Z

Hm, yes, but why specifically Wesley, the compiler lead that has the least time to review things?

In any case, you're probably thinking of code generation of destructors. Those shouldn't be relevant to your PR because that code was already written a long time ago. Most of the SEH code is in the library, here:

rust/library/panic_unwind/src/seh.rs

Line 4 in 91376f4

//! mechanism is Structured Exception Handling (SEH). This is quite different

I recommend you just remove the parts that expose this to public API from this PR, as exposing new API is its own process, and code review for this will be exciting enough.

kromych · 2024-08-12T19:40:37Z

Hm, yes, but why specifically Wesley, the compiler lead that has the least time to review things?

Wesley's also on the wg-debugging group: https://blog.rust-lang.org/inside-rust/2022/02/22/compiler-team-ambitions-2022.html#debugging-initiatives-. That looked to me as an opportunity to bring attention to the value of that PR for the debugging experience besides examining the sanity from the compiler perspective.

In any case, you're probably thinking of code generation of destructors. Those shouldn't be relevant to your PR because that code was already written a long time ago. Most of the SEH code is in the library, here:

rust/library/panic_unwind/src/seh.rs

Line 4 in 91376f4

//! mechanism is Structured Exception Handling (SEH). This is quite different

I recommend you just remove the parts that expose this to public API from this PR, as exposing new API is its own process, and code review for this will be exciting enough.

Appreciated, will do!

kromych · 2024-08-12T21:30:30Z

I strongly recommend you run check builds locally.

Ran that command locally, thanks! There was one issue that I could not explain: if something is used inside the panic handler only, that something is considered a dead code.

workingjubilee · 2024-08-12T21:32:49Z

Ah, that makes more sense.

I don't mean to be so nosy but I would like to see this functionality land and thus want to make sure it happens in the way that is smoothest. Generally T-compiler members don't approve library PRs and vice versa. Thus often a PR that tries to do both compiler changes and internal stdlib changes and expose new library API is doomed because of this split responsibility. We aren't unthinking servants of process but it's best not to fudge anything when it's unnecessary and we can simply pipeline things.

More briefly: let's build the bikeshed before we have an argument over how to paint it.

For more information on process in general, you may wish to consult the rustc dev guide and the std dev guide. You will want to consult the former anyways as it explains how to add new tests and you may need a fairly specialized test to make sure "break into debugger" works, as I don't think either the normal UI test suite or debuginfo test suite support that. Our library tests only work for code that doesn't need to do weird stuff with a process.

If we do need special frame-by-frame code generation for this, it should probably not be implemented as including global_asm! but rather be implemented using an intrinsic, which are located hereabouts: https:/rust-lang/rust/blob/91376f416222a238227c84a848d168835ede2cc3/library/core/src/intrinsics.rs

And those are in core, not std. But I don't know that we do. The Rust compiler already has the necessary infrastructure to handle SEH, because that is how unwinding on Windows works. That is, the following C, calling these two functions:

void try_code(void);
void except_code(void);

__declspec(noinline) int try_break_into_debugger()
{
 __try
  {
    try_code();
    return 0;
  }
  __except (1)
  {
    except_code();
    return 1;
  }
  return 0;
}

should be functionally identical to:

use other_crate::{try_code, except_code};
    
if let Err(_) = catch_unwind(|| try_code()) {
    except_code();
};

Intrinsics themselves are actually language features and so require rubberstamps from a different group. Isn't process fun?

library/std/src/sys/dbg.rs

romank-msft · 2024-08-12T21:40:13Z

Something broke down again in the CI, will try to replicate locally

bors · 2024-09-08T07:20:55Z

💡 This pull request was already approved, no need to approve it again.

There's another pull request that is currently being tested, blocking this pull request: [do not merge] CI experiments #112049

bors · 2024-09-08T07:20:56Z

📌 Commit fc28a2a has been approved by workingjubilee

It is now in the queue for this repository.

workingjubilee · 2024-09-08T07:21:42Z

...? huh, approving it twice feels weird, I'm gonna...

@bors r-

workingjubilee · 2024-09-08T07:21:52Z

@bors r+

bors · 2024-09-08T07:21:55Z

📌 Commit fc28a2a has been approved by workingjubilee

It is now in the queue for this repository.

bors · 2024-09-08T10:28:29Z

⌛ Testing commit fc28a2a with merge 7b18b3e...

bors · 2024-09-08T12:57:14Z

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 7b18b3e to master...

rust-timer · 2024-09-08T14:14:29Z

Finished benchmarking commit (7b18b3e): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary 2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.3%	[1.5%, 3.6%]	8
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.1%, secondary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.3%]	12
Regressions ❌ (secondary)	0.3%	[0.2%, 0.3%]	38
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[0.0%, 0.3%]	12

Bootstrap: 756.659s -> 754.738s (-0.25%)
Artifact size: 341.09 MiB -> 341.16 MiB (0.02%)

Tested in the simulator and on the device I had lying around, a 1st generation iPad Mini (which isn't Aarch64, but shows that the `sysctlbyname` calls still work even there, even if they return false). `sysctlbyname` _should_ be safe to use without causing rejections from the app store, as its usage is documented in: https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics Also, the standard library will use these soon anyhow, so this shouldn't affect the situation: rust-lang/rust#129019

khuey · 2024-09-23T20:53:06Z

My apologies if this was debated in the 166 comments I didn't read, but I don't think the quality of implementation here is suitable for shipping on Linux, even in nightly. This behavior is triggered even if the panic is ultimately handled via catch_unwind, and on Linux it triggers for any ptracer, even things that aren't interactive debuggers (e.g. strace). The net result is that Rust programs that use panics at all no longer function under tools like strace or rr after this change. I think this should be reverted.

kromych · 2024-09-24T00:42:44Z

My apologies if this was debated in the 166 comments I didn't read, but I don't think the quality of implementation here is suitable for shipping on Linux, even in nightly. This behavior is triggered even if the panic is ultimately handled via catch_unwind, and on Linux it triggers for any ptracer, even things that aren't interactive debuggers (e.g. strace). The net result is that Rust programs that use panics at all no longer function under tools like strace or rr after this change. I think this should be reverted.

In my view, your arguments are enough to revert this for Linux, especially if no one finds that useful at all there. If the latter cannot be known for sure, perhaps wrap the logic in rust_panic in a conditional something like

if let Some(bp) = env::get("RUST_BREAKPOINT_ON_PANIC") {
    if bp == "1"  {
        let _ = breakpoint_if_debugging()
    }
}

i.e. one would need to set an env. variable RUST_BREAKPOINT_ON_PANIC to get this code to run the breakpoint instruction, even if the presence of a debugger (tracer) is detected.

kromych · 2024-09-24T18:18:22Z

This change makes a ptraced/straced program exit with SIG_TRAP when the program panics, and the core dump is generated. Assuming that panicking implies there is no way of continuing computation, that behaviour doesn't look as a deal breaker to me.

That assumption of panic being a catastrophic failure (hence meaning the program exits) is broken by std::panic::catch_unwind. Its usage pattern seems to be

std::panic::catch_unwind(|| /* something that panics */);

so that the program does not exit upon panicking although it - quite weirdly - does print the panic message when using panic!, etc. When the program is run under a ptrace-using tool, this PR might make the program exit instead of continuing if the tool doesn't have the gdb/lldb smarts.

I cannot comment if std::panic::catch_unwind is guaranteed to work under ptrace and not handling SIG_TRAP. It appears that its documentation enumerates various existing footguns with that function. Instead of feature-gating via the env var or reverting the logic for Linux, one might document that behaviour thus adding to the list of precautions of std::panic::catch_unwind one more item.

workingjubilee · 2024-09-24T23:48:49Z

@khuey Yeah, I noticed that (after it was merged, and from people discussing this elsewhere) and wondered if that would be a practical issue since the program would often die anyways? But if it's causing problems in rr, I think we should remove the Linux impl.

@khuey

This breaks `rr`, see rust-lang#129019 (comment) for the discussion CC @khuey @workingjubilee

kromych · 2024-09-25T03:06:34Z

Here is the PR #130810

Don't trap into the debugger on panics under Linux This breaks `rr`, see rust-lang#129019 (comment) for the discussion CC `@khuey` `@workingjubilee`

Rollup merge of rust-lang#130810 - kromych:master, r=workingjubilee Don't trap into the debugger on panics under Linux This breaks `rr`, see rust-lang#129019 (comment) for the discussion CC `@khuey` `@workingjubilee`

kromych · 2024-09-25T23:05:04Z

As this is being reverted in #130846, for the folks, who liked this, I've published https://crates.io/crates/dbg_breakpoint:

Breakpoints when the debugger is attached

Set breakpoints with the breakpoint!() macro on many target architectures
and popular OSes like FreeBSD, macOS, iOS, Linux distro's, Windows without
using the nightly toolchain. Break into the debugger with an easy
breakpoint_if_debugging() call, too!

Well, sure, but why?

It might be more convinient to add the call to breakpoint_if_debugging from inside
the comfort of your editor than to remember the incantion in the debugger,

Some callsites like lambdas and async routines/coroutines can be tricky to set a
breakpoint to in the debugger due to name mangling or because the toolchain doesn't
give them a name that is easily-discovered/human-friendly,

Can add this to your #[panic_handler] to break into the debugger on a panic.

This model might be reminiscent of "semihosting" where the execution environment
includes a host or a debugger who's services might be requested by the program.

Here is the example of how one can make use of this: runme.rs.
Do exercise extreme caution when using any of this in the production environment, i.e.
out of the inner development loop. Heisenbugs and crashes might be sighted.

Platform- and target-specific notes follow.

Windows

The library provides breakpoint_if_debugging() and breakpoint_if_debugging_seh()
The latter might be useful to detect the debugger if it is trying to hide its presence
via some cheap tricks.

Linux, macOS and FreeBSD

The debugger detection logic will detect any tracer like strace as the debugger, and
if the tracer isn't able to skip over the breakpoint CPU instruction, the program will
crash. That can be fixed by handling SIGTRAP inside your program.

arm64

brk #imm16 is used for breakpoint on arm64.

Just FYI, the #imm16 value can be inside the Linux kernel 6.1
at the time of writing:

0x004: for installing kprobes

0x005: for installing uprobes

0x006: for kprobe software single-step

0x400 - 0x7ff: kgdb

0x100: for triggering a fault on purpose (reserved)

0x400: for dynamic BRK instruction

0x401: for compile time BRK instruction

0x800: kernel-mode BUG() and WARN() traps

0x9xx: tag-based KASAN trap (allowed values 0x900 - 0x9ff)

0x8xxx: Control-Flow Integrity traps

Here, we're talking the user mode yet the above illustrates the point
that the value supplied after brk influences what to expect.

For __builtin_trap(), gcc produces brk #0x3e8, clang generates brk #1.
This library uses 0xf000 as the debuggers on Windows and macOS skip over the debug
trap automatically in this case by advancing the instruction pointer behind the
curtain.

See also

C++'s "Debugging Support" paper.

rustbot assigned workingjubilee Aug 12, 2024

rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 12, 2024

rustbot assigned wesleywiser and unassigned workingjubilee Aug 12, 2024

kromych commented Aug 12, 2024

View reviewed changes

library/std/src/panicking.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot assigned workingjubilee and unassigned wesleywiser Aug 12, 2024

workingjubilee requested changes Aug 12, 2024

View reviewed changes

library/std/src/os/mod.rs Outdated Show resolved Hide resolved

library/std/src/os/windows/dbg.rs Outdated Show resolved Hide resolved

library/std/src/os/windows/dbg.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 12, 2024

kromych force-pushed the master branch from 115755d to b4235b8 Compare August 12, 2024 21:25

This comment has been minimized.

Sign in to view

workingjubilee reviewed Aug 12, 2024

View reviewed changes

library/std/src/sys/dbg.rs Outdated Show resolved Hide resolved

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 8, 2024

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 8, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 8, 2024

bors merged commit 7b18b3e into rust-lang:master Sep 8, 2024
7 checks passed

rustbot added this to the 1.83.0 milestone Sep 8, 2024

rustbot removed the perf-regression Performance regression. label Sep 8, 2024

madsmtm mentioned this pull request Sep 12, 2024

Improve Xcode support madsmtm/objc2#459

Open

kromych added a commit to kromych/rust that referenced this pull request Sep 25, 2024

Don't trap into the debugger on panics under Linux

49d1c3b

This breaks `rr`, see rust-lang#129019 (comment) for the discussion CC @khuey @workingjubilee

kromych mentioned this pull request Sep 25, 2024

Don't trap into the debugger on panics under Linux #130810

Merged

cuviper mentioned this pull request Sep 25, 2024

Revert Break into the debugger on panic (129019) #130846

Merged

Break into the debugger (if attached) on panics (Windows, Linux, macOS, FreeBSD) #129019

Break into the debugger (if attached) on panics (Windows, Linux, macOS, FreeBSD) #129019

Conversation

kromych commented Aug 12, 2024 • edited Loading

rustbot commented Aug 12, 2024

kromych commented Aug 12, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

workingjubilee commented Aug 12, 2024

workingjubilee commented Aug 12, 2024

workingjubilee left a comment

Choose a reason for hiding this comment

kromych commented Aug 12, 2024

workingjubilee commented Aug 12, 2024

kromych commented Aug 12, 2024

workingjubilee commented Aug 12, 2024

kromych commented Aug 12, 2024

kromych commented Aug 12, 2024

workingjubilee commented Aug 12, 2024

This comment has been minimized.

romank-msft commented Aug 12, 2024

bors commented Sep 8, 2024

bors commented Sep 8, 2024

workingjubilee commented Sep 8, 2024

workingjubilee commented Sep 8, 2024

bors commented Sep 8, 2024

bors commented Sep 8, 2024

bors commented Sep 8, 2024

rust-timer commented Sep 8, 2024

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

khuey commented Sep 23, 2024

kromych commented Sep 24, 2024

kromych commented Sep 24, 2024

workingjubilee commented Sep 24, 2024 • edited Loading

kromych commented Sep 25, 2024

kromych commented Sep 25, 2024

Breakpoints when the debugger is attached

Windows

Linux, macOS and FreeBSD

arm64

See also

kromych commented Aug 12, 2024 •

edited

Loading

workingjubilee commented Sep 24, 2024 •

edited

Loading