Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Miri's BorTag GC to a Provenance GC #118029

Merged
merged 4 commits into from
Nov 21, 2023
Merged

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Nov 18, 2023

As suggested in rust-lang/miri#3080 (comment)

We previously solved memory growth issues associated with the Stacked Borrows and Tree Borrows runtimes with a GC. But of course we also have state accumulation associated with whole allocations elsewhere in the interpreter, and this PR starts tackling those.

To do this, we expand the visitor for the GC so that it can visit a BorTag or an AllocId. Instead of collecting all live AllocIds into a single HashSet, we just collect from the Machine itself then go through an accessor InterpCx::is_alloc_live which checks a number of allocation data structures in the core interpreter. This avoids the overhead of all the inserts that collecting their keys would require.

r? @RalfJung

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 18, 2023
@saethlin saethlin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 18, 2023
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Nov 18, 2023
@saethlin
Copy link
Member Author

That was 30 minutes without "Collect Me 🥺" mode, let's see what it is now...

@saethlin
Copy link
Member Author

Of course that didn't do anything. Working with the test suite is never that easy.

@rustbot rustbot added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Nov 18, 2023
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I wonder what the effects on performance are.

Also, please add the testcase you posted in the other PR.

src/ci/docker/host-x86_64/x86_64-gnu-tools/checktools.sh Outdated Show resolved Hide resolved
src/tools/miri/README.md Outdated Show resolved Hide resolved
src/tools/miri/src/intptrcast.rs Outdated Show resolved Hide resolved
src/tools/miri/src/intptrcast.rs Outdated Show resolved Hide resolved
pub fn is_alloc_live(&self, id: AllocId) -> bool {
self.tcx.try_get_global_alloc(id).is_some()
|| self.memory.alloc_map.contains_key_ref(&id)
|| self.memory.extra_fn_ptr_map.contains_key(&id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth thinking about the order in which we check these? get_alloc_info checks memory.alloc_map first sine I assume that's the most common case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tuned the order against the test 0weak_memory_consistency, because its runtime explodes after this PR with GC every block. I'll try some other programs.

Copy link
Member Author

@saethlin saethlin Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a bit more into this. I've run (chunks of) the test suites for regex, simd-json, and http and in all these cases > 40% of all the checks for an AllocId result in GlobalAlloc::Memory, but in the 0weak_memory_consistency test, the biggest hit is GlobalAlloc::Function (oddly enough the fraction of hits on functions climbs as the test goes on, not sure why, I'd expect every turn of the loop in that test to be the same).

Since all the global allocs can never be deallocated I think it might be worth stealing a bit from AllocId so that the Id itself can locally indicate whether the allocation deallocated; that would let us skip over them in the GC without consulting a hash table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying we should stay with "global first" for now? Works for me.

@saethlin
Copy link
Member Author

Looks good, but I wonder what the effects on performance are.

Looks like a 2% improvement, based on bumping up the invalidate benchmark's iterations to 1_000_000 (which only bumps the wall time to 40 seconds, which is long for a benchmark but not for something like a test suite). The fact that this halves the peak memory of that program is more exciting to me.

@RalfJung
Copy link
Member

Could you run our entire benchmark suite, just to make sure we don't accidentally introduce a big regression somewhere? (I hope the default 10k blocks interval is enough that the GC is triggered a couple times during the benchmark.^^)

@saethlin
Copy link
Member Author

Everything is within noise, except for the invalidate benchmark with increased iterations. If there are any changes which could be revealed by bumping up the size of the other, they are marginal.

Commit I branched off:

Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      4.179 s ±  0.021 s    [User: 4.046 s, System: 0.123 s]
  Range (min … max):    4.151 s …  4.207 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/invalidate/Cargo.toml
  Time (mean ± σ):     40.131 s ±  0.648 s    [User: 39.924 s, System: 0.153 s]
  Range (min … max):   39.621 s … 41.249 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):     746.6 ms ±   7.1 ms    [User: 670.2 ms, System: 71.7 ms]
  Range (min … max):   737.5 ms … 754.6 ms    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      2.458 s ±  0.020 s    [User: 2.366 s, System: 0.085 s]
  Range (min … max):    2.440 s …  2.491 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      5.099 s ±  0.016 s    [User: 4.994 s, System: 0.095 s]
  Range (min … max):    5.079 s …  5.115 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):     666.3 ms ±   4.5 ms    [User: 583.4 ms, System: 79.0 ms]
  Range (min … max):   662.0 ms … 673.2 ms    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      1.675 s ±  0.013 s    [User: 1.588 s, System: 0.082 s]
  Range (min … max):    1.661 s …  1.697 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/zip-equal/Cargo.toml
  Time (mean ± σ):      2.644 s ±  0.042 s    [User: 2.546 s, System: 0.091 s]
  Range (min … max):    2.576 s …  2.690 s    5 runs

This PR:

Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      4.117 s ±  0.036 s    [User: 4.015 s, System: 0.093 s]
  Range (min … max):    4.078 s …  4.155 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/invalidate/Cargo.toml
  Time (mean ± σ):     38.479 s ±  0.119 s    [User: 38.337 s, System: 0.091 s]
  Range (min … max):   38.300 s … 38.588 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):     743.8 ms ±   5.6 ms    [User: 666.1 ms, System: 73.9 ms]
  Range (min … max):   737.5 ms … 748.9 ms    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      2.434 s ±  0.033 s    [User: 2.349 s, System: 0.078 s]
  Range (min … max):    2.403 s …  2.481 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      5.138 s ±  0.025 s    [User: 4.912 s, System: 0.083 s]
  Range (min … max):    5.105 s …  5.171 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):     671.7 ms ±   7.1 ms    [User: 594.0 ms, System: 72.3 ms]
  Range (min … max):   663.4 ms … 677.6 ms    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      1.649 s ±  0.029 s    [User: 1.570 s, System: 0.073 s]
  Range (min … max):    1.603 s …  1.684 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/zip-equal/Cargo.toml
  Time (mean ± σ):      2.597 s ±  0.040 s    [User: 2.487 s, System: 0.104 s]
  Range (min … max):    2.528 s …  2.632 s    5 runs

@saethlin
Copy link
Member Author

The benchmark that runs the GC the least is slice-get-unchecked, which runs it 14 times. I think that's probably good enough.

@saethlin
Copy link
Member Author

saethlin commented Nov 19, 2023

Running the fail tests too looks like it adds 2 or 3 minutes to the x86_64-gnu-tools CI job. I want to do that because it will make this next thing easier...

@saethlin
Copy link
Member Author

saethlin commented Nov 19, 2023

Always setting -Zmiri-provenance-gc=1 even if we override it in 0weak_memory_consistency adds ~21 minutes to CI time. That's not great.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@saethlin
Copy link
Member Author

~8 minutes total increase in CI time. That seems a bit excessive. Let's try applying the "just run pass tests" trick again?

@saethlin
Copy link
Member Author

That shaved off maybe two minutes (uncertain of noise, not going to try checking).

For all the targets without -Zmiri-provenance-gc=1 we spend 33 seconds building the sysroot then 30 seconds running the tests. With -Zmiri-provenance-gc=1 running the tests takes 220 seconds.

In my opinion, for now the best compromise is to just run one target with -Zmiri-provenance-gc=1. I have an idea on how to make the GC faster that I might be able to land in a few days so I'd prefer to get this PR in then work on that and promote more targets to -Zmiri-provenance-gc=1.

@saethlin saethlin marked this pull request as ready for review November 19, 2023 05:28
@rustbot
Copy link
Collaborator

rustbot commented Nov 19, 2023

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 19, 2023
@saethlin
Copy link
Member Author

@bors r=RalfJung

@bors
Copy link
Contributor

bors commented Nov 19, 2023

📌 Commit d7661d5 has been approved by RalfJung

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 19, 2023
@bors
Copy link
Contributor

bors commented Nov 20, 2023

⌛ Testing commit d7661d5 with merge dadde3e...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 20, 2023
Expand Miri's BorTag GC to a Provenance GC

As suggested in rust-lang/miri#3080 (comment)

We previously solved memory growth issues associated with the Stacked Borrows and Tree Borrows runtimes with a GC. But of course we also have state accumulation associated with whole allocations elsewhere in the interpreter, and this PR starts tackling those.

To do this, we expand the visitor for the GC so that it can visit a BorTag or an AllocId. Instead of collecting all live AllocIds into a single HashSet, we just collect from the Machine itself then go through an accessor `InterpCx::is_alloc_live` which checks a number of allocation data structures in the core interpreter. This avoids the overhead of all the inserts that collecting their keys would require.

r? `@RalfJung`
@rust-log-analyzer

This comment has been minimized.

# We set the GC interval to the shortest possible value (0 would be off) to increase the chance
# that bugs which only surface when the GC runs at a specific time are more likely to cause CI to fail.
# This significantly increases the runtime of our test suite, or we'd do this in PR CI too.
if [[ -z "${PR_CI_JOB}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, with -u set, testing for the existence of a shell variable is not so easy any more... :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! The place I got this from doesn't use -u so I guess I'll do as that script does

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, +u/-u dance seems safer (I really do hate shell scripting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh no please don't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[ -z "${PR_CI_JOB-}" ]] should work, the ${PR_CI_JOB-default}syntax means "fall back todefault` if variable does not exist" and here we use an empty default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. So we just need to lose -u for the whole script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or modify the caller to always set the variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just use "${PR_CI_JOB-}" instead of "${PR_CI_JOB}"

@saethlin
Copy link
Member Author

@bors r=RalfJung

@bors
Copy link
Contributor

bors commented Nov 20, 2023

📌 Commit 5376e72 has been approved by RalfJung

It is now in the queue for this repository.

@RalfJung
Copy link
Member

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 20, 2023
# We set the GC interval to the shortest possible value (0 would be off) to increase the chance
# that bugs which only surface when the GC runs at a specific time are more likely to cause CI to fail.
# This significantly increases the runtime of our test suite, or we'd do this in PR CI too.
if [[ -z "${PR_CI_JOB:-}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had suggested the version without :, but with : also works here.

Copy link
Member Author

@saethlin saethlin Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look up docs for shell expansion and I could only find the ones with :- documented and that's what the Miri repo uses. What changes if the colon is omitted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the page I got this from but https://stackoverflow.com/a/16753536 lists them all. They behave differently when the variable exists but is empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html says this before the list that has :- etc

Omitting the colon results in a test only for a parameter that is unset. Put another way, if the colon is included, the operator tests for both parameter’s existence and that its value is not null; if the colon is omitted, the operator tests only for existence.

I'd say the version without the colon is the more reasonable semantics (null != does-not-exist), but the shell world is weird.^^

@RalfJung
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Nov 20, 2023

📌 Commit 9ada654 has been approved by RalfJung

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 20, 2023
compiler-errors added a commit to compiler-errors/rust that referenced this pull request Nov 21, 2023
Expand Miri's BorTag GC to a Provenance GC

As suggested in rust-lang/miri#3080 (comment)

We previously solved memory growth issues associated with the Stacked Borrows and Tree Borrows runtimes with a GC. But of course we also have state accumulation associated with whole allocations elsewhere in the interpreter, and this PR starts tackling those.

To do this, we expand the visitor for the GC so that it can visit a BorTag or an AllocId. Instead of collecting all live AllocIds into a single HashSet, we just collect from the Machine itself then go through an accessor `InterpCx::is_alloc_live` which checks a number of allocation data structures in the core interpreter. This avoids the overhead of all the inserts that collecting their keys would require.

r? `@RalfJung`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2023
Rollup of 6 pull requests

Successful merges:

 - rust-lang#116085 (rustdoc-search: add support for traits and associated types)
 - rust-lang#117522 (Remove `--check-cfg` checking of command line `--cfg` args)
 - rust-lang#118029 (Expand Miri's BorTag GC to a Provenance GC)
 - rust-lang#118035 (Fix early param lifetimes in generic_const_exprs)
 - rust-lang#118083 (Remove i686-apple-darwin cross-testing)
 - rust-lang#118091 (Remove now deprecated target x86_64-sun-solaris.)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit cbadb2e into rust-lang:master Nov 21, 2023
11 checks passed
@rustbot rustbot added this to the 1.76.0 milestone Nov 21, 2023
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2023
Rollup merge of rust-lang#118029 - saethlin:allocid-gc, r=RalfJung

Expand Miri's BorTag GC to a Provenance GC

As suggested in rust-lang/miri#3080 (comment)

We previously solved memory growth issues associated with the Stacked Borrows and Tree Borrows runtimes with a GC. But of course we also have state accumulation associated with whole allocations elsewhere in the interpreter, and this PR starts tackling those.

To do this, we expand the visitor for the GC so that it can visit a BorTag or an AllocId. Instead of collecting all live AllocIds into a single HashSet, we just collect from the Machine itself then go through an accessor `InterpCx::is_alloc_live` which checks a number of allocation data structures in the core interpreter. This avoids the overhead of all the inserts that collecting their keys would require.

r? ``@RalfJung``
@saethlin saethlin deleted the allocid-gc branch December 14, 2023 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants