Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger a panic on allocation error on nightly Rust #66347

Closed
wants to merge 2 commits into from

Conversation

Aaron1011
Copy link
Member

In issue #66342, we're seeing extremely large allocations by rustc
(4GB for the hex crate, which is only a few hundred lines). This is
exhausing the memory on CI, causing jobs to fail intermittently.

This PR installs a custom allocation error hook for nightly compilers,
which attempts to trigger a panic after printing the initial error
message. Hopefully, this will allow us to retrieve a backtrace when one
of these large spurious allocations occurs.

The hook is installed in librustc_driver, so that other compiler
frontends (e.g. clippy) will get this logic as well.

I'm unsure if this needs to be behind any kind of additional feature
gate, beyond being ngithly only. This only affects compiler frontends,
not generic users of libstd. While this will nake OOM errors on
nightly much more verbose, I don't think this is necessarily a bad
thing. I would expect that out of memory errors when running the
compiler are usually infrequent, so most users will probably never
notice this change. If any users are experiencing #66342 (or something
like it) on their own crates, the extra output might even be useful to
them.

I don't know of any reasonable way of writing a test for this. I
manually verified the implementation by inserting:
let _a: Vec<usize> = Vec::with_capacity(9999999999) into
librustc_driver after the hook installation, and verified that
a backtrace was printed.

If we're very unlucky, it may turn out the large allocation on CI
happens to occur after several large successful allocations, leaving
extremely little memory left when we try to panic. If this is the case,
then we may fail to panic or print the backtrace, since panicking
currently allocates memory. However, we will still print an error
message, so the output will be no less useful (though a little more
spammy) then before.

@rust-highfive
Copy link
Collaborator

r? @cramertj

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 12, 2019
@Mark-Simulacrum
Copy link
Member

Instead of panicking, could we just capture and print a backtrace? I believe that given std::Backtrace that shouldn't be too hard today, right?

In issue rust-lang#66342, we're seeing extremely large allocations by rustc
(4GB for the `hex` crate, which is only a few hundred lines). This is
exhausing the memory on CI, causing jobs to fail intermittently.

This PR installs a custom allocation error hook for nightly compilers,
which attempts to trigger a panic after printing the initial error
message. Hopefully, this will allow us to retrieve a backtrace when one
of these large spurious allocations occurs.

The hook is installed in `librustc_driver`, so that other compiler
frontends (e.g. clippy) will get this logic as well.

I'm unsure if this needs to be behind any kind of additional feature
gate, beyond being ngithly only. This only affects compiler frontends,
not generic users of `libstd`. While this will nake OOM errors on
nightly much more verbose, I don't think this is necessarily a bad
thing. I would expect that out of memory errors when running the
compiler are usually infrequent, so most users will probably never
notice this change. If any users are experiencing rust-lang#66342 (or something
like it) on their own crates, the extra output might even be useful to
them.

I don't know of any reasonable way of writing a test for this. I
manually verified the implementation by inserting:
`let _a: Vec<usize> = Vec::with_capacity(9999999999)` into
`librustc_driver` after the hook installation, and verified that
a backtrace was printed.

If we're very unlucky, it may turn out the large allocation on CI
happens to occur after several large successful allocations, leaving
extremely little memory left when we try to panic. If this is the case,
then we may fail to panic or print the backtrace, since panicking
currently allocates memory. However, we will still print an error
message, so the output will be no less useful (though a little more
spammy) then before.
@Aaron1011
Copy link
Member Author

@Mark-Simulacrum: I've updated the PR to explicitly print a backtrace instead of panicking.

@Mark-Simulacrum
Copy link
Member

I'm going to do r? @alexcrichton here on the general approach -- it seems largely reasonable that we'd want to print a backtrace here (maybe even in general?). I am somewhat concerned by the fact that this is still sticking itself into liballoc instead of being done outside that, which I would've expected to be possible...

@Mark-Simulacrum
Copy link
Member

Also worth noting that #66342 (comment) makes this not urgent, and as such we may not want it :)

@alexcrichton
Copy link
Member

Yes can this avoid adding a function to libstd? Can the hook live in the caller? (I think it can just be a closure?)

@Aaron1011
Copy link
Member Author

That would require exposing dumb_print. I think it's better to expose a self-contained alloc error hook.

@Mark-Simulacrum
Copy link
Member

It seems fine to just use eprintln in this case? AFAICT dumb_print isn't doing anything too special on most platforms we care about for this purpose...

@alexcrichton
Copy link
Member

Yes dumb_print isn't very special, it should either use eprintln! or just be copied into the rustc crates.

@bjorn3
Copy link
Member

bjorn3 commented Nov 14, 2019

eprintln would try to lock stderr. That would cause a deadlock when another thread keeps stderr locked. dumb_print however would write to stderr even if it is being locked.

@Mark-Simulacrum
Copy link
Member

I think we don't really care about that failure case for the compiler (we're going to die, if we deadlock instead... well, not great, but not too terrible either).

@Aaron1011
Copy link
Member Author

While we can't guarantee that printing a backtrace here will succeed, I don't think we should pick a more-deadlock-prone solution solely to keep a tiny function out of libstd.

@alexcrichton
Copy link
Member

Ok, then I think this should copy dumb_print into rustc to use it there, this still shouldn't be adding a function to libstd.

@Aaron1011
Copy link
Member Author

@alexcrichton: dumb_print calls panic_output, which is private. We would need to expose that as well, since it's platform specific.

It seems much easier to expose a two-line perma-unstable function than to try to duplicate the necessary bits of libstd, or to deal with a unecessarily suboptimal print function.

@Mark-Simulacrum
Copy link
Member

It seems likely that what we may want here is an unstable try_lock on std::io::{Stderr, Stdout} which we could then use. Presumably that can be FCP'd through libs team and is in general a much more general thing -- i.e., more reasonable to add to the libraries.

@alexcrichton
Copy link
Member

I understand it's easier to expose unstable functionality from libstd than to copy it into the compiler. That being said it is not a free cost because then the libs team needs to maintain it, field questions about its stability, etc. My personal preference is pretty firm, this feature for rustc should not at this time result in unstable APIs being added to libstd.

@Mark-Simulacrum
Copy link
Member

(To be clear, I was musing on the general topic of "I want to print to stdout/stderr in a non-deadlocking manner" -- I don't think that addition should happen as part of this work.)

I suspect that the best thing to do here might just be to close this; it's clear that it's a pretty hard problem and it's not super clear on exactly what we want to do (I still personally think that deadlocking might be more or less fine, and rather infrequent -- things holding stdout/stderr locks in the compiler is pretty rare, and println! and eprintln! I think for the most part shouldn't allocate within the lock?).

@bjorn3
Copy link
Member

bjorn3 commented Nov 15, 2019

I think it is useful for normal users to have a backtrace on OOM too, instead of just reserving it for rustc.

@alexcrichton
Copy link
Member

A feature request for the compiler to debug an error happening on CI is not the proper place to make a proposal for an official unstable API to be added to the standard library. If that's the way this is going then this should be closed and reopened where its main focus is adding new APIs to the standard library. Adding APIs shouldn't "just do it off-hand if rustc needs it"

@Mark-Simulacrum
Copy link
Member

I'm going to close this as I don't think it's productive to keep it open. I believe that the next step, if we want this, is to open a new PR adding a try_lock API or so to Stderr/Stdout (unstable, of course). I don't personally care about this though especially as we've now figured out the underlying cause in this specific problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants