Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_codegen_ssa: Use llvm.invariant intrinsics on arguments that are immutable references; off by default. #103070

Closed
wants to merge 1 commit into from

Conversation

pcwalton
Copy link
Contributor

Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of & references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases.

We're already using LLVM's readonly parameter attribute on such calls. However, the semantics of readonly are akin to const in C++, in that they only promise to LLVM that the function won't mutate the parameter through that pointer, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability of readonly to LLVM's alias analysis. Instead of readonly, the correct way to express strong immutability guarantees on memory is through the llvm.invariant.start and llvm.invariant.end intrinsics. These enable a frontend like rustc to describe immutability of memory regions in an expressive, flow-sensitive manner.

Unfortunately, LLVM doesn't use the llvm.invariant.start and llvm.invariant.end intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph.

Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes rustc use llvm.invariant in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default -Z emit-invariant-markers=yes flag.

Obviously, a lot more can be done to use llvm.invariant more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of more llvm.invariant use can be measured more precisely.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 14, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 14, 2022

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

@rust-highfive
Copy link
Collaborator

r? @TaKO8Ki

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 14, 2022
@pcwalton pcwalton force-pushed the llvm-invariant branch 2 times, most recently from d65e579 to 8997e72 Compare October 14, 2022 23:28
@pcwalton
Copy link
Contributor Author

pcwalton commented Oct 15, 2022

Another benefit of the conservatism here is that it avoids this PR depending too much on the semantics of unsafe code. Memory pointed to by immutable function parameter references to freezable memory is clearly immutable for the lifetime of the function regardless of what semantics we decide on for unsafe code; if they weren't it would very probably be UB already given that we mark such parameters as readonly right now.

immutable references; off by default.

Optimization failures around reloads and memcpy optimizations are frequently
traceable to LLVM's failure to prove that memory can't be mutated by a call or
store. This problem is especially acute in Rust, where large values tend to be
memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on
mutability available than C++ does, via the strong immutability of `&`
references. This should allow LLVM to prove that memory can't be modified by
stores and calls in more cases.

We're already using LLVM's `readonly` parameter attribute on such calls.
However, the semantics of `readonly` are akin to `const` in C++, in that they
only promise to LLVM that the function won't mutate the parameter *through that
pointer*, not that the pointed-to memory is immutable for the entire duration
of the function. These weak semantics limit the applicability of `readonly` to
LLVM's alias analysis. Instead of `readonly`, the correct way to express strong
immutability guarantees on memory is through the `llvm.invariant.start` and
`llvm.invariant.end` intrinsics. These enable a frontend like `rustc` to
describe immutability of memory regions in an expressive, flow-sensitive
manner.

Unfortunately, LLVM doesn't use the `llvm.invariant.start` and
`llvm.invariant.end` intrinsics for much at the moment. It's only used in one
optimization in loop-invariant code motion at this time. Follow-up work will
need to be done in LLVM to integrate these intrinsics into alias analysis.
Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that
uses graph reachability algorithms to analyze the extent of the guarantees
provided by these intrinsics to the control flow graph.

Regardless, this front-end work needs to happen as a prerequisite for any LLVM
work, so that the improvements to LLVM can be measured and tested. So this
commit makes `rustc` use `llvm.invariant` in a minimal way: on immutable
references to "freeze" types (i.e. not transitively containing UnsafeCell)
passed directly as parameters to functions. This is off by default, gated
behind the non-default `-Z emit-invariant-markers=yes` flag.

Obviously, a lot more can be done to use `llvm.invariant` more liberally in the
future, but this can be added over time, especially once more LLVM optimization
passes use that infrastructure. This is simply the bare minimum for now. Once
LLVM uses those intrinsics for more optimizations, the effects of more
`llvm.invariant` use can be measured more precisely.
@nikic
Copy link
Contributor

nikic commented Oct 15, 2022

I'd suggest doing a perf run with this enabled by default.

TBH, I'm not sure this is a viable way forward. I expect that this is going to regress optimization potential in practice, because invariant.start is essentially unused for optimization purposes (the only thing clang uses it for is to convert globals into constants after evaluating global ctors). At the same time, invariant.start will introduce a ref effect and invariant.end a modref effect, and from a quick look it doesn't seem like code is able to look past these effectively. I think it's unlikely that use of invariant.start and invariant.end for optimization in LLVM will increase much in the future, because their control-flow dependence would make this expensive in terms of compile-time.

The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.

However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.

@nikic
Copy link
Contributor

nikic commented Oct 15, 2022

Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.

@pcwalton
Copy link
Contributor Author

pcwalton commented Oct 15, 2022

Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.

No, it doesn't. See: https://godbolt.org/z/Wxn3WWWfa

LLVM doesn't know that bar() doesn't mutate *x somehow, despite it being marked readonly and noalias in all functions. So it has to issue a needless reload of the value.

However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.

I'd like to be able to use this on references that aren't directly function parameters. e.g.

struct Foo<'a> {
    bar: &'a i32,
}

fn f(foo: &Foo) {
    let bar = foo.bar;
    g(*bar);
    h(bar);
    g(*bar);
}

should be optimized to avoid the redundant *bar load. This seems to require more than just parameter metadata.

@nikic
Copy link
Contributor

nikic commented Oct 15, 2022

LLVM doesn't know that bar() doesn't mutate *x somehow, despite it being marked readonly and noalias in all functions. So it has to issue a needless reload of the value.

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

@pcwalton
Copy link
Contributor Author

pcwalton commented Oct 15, 2022

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory() and it seems to work on a simple test case. Running tests overnight.

@pcwalton
Copy link
Contributor Author

The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.

I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)

@nikic
Copy link
Contributor

nikic commented Oct 15, 2022

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory() and it seems to work on a simple test case. Running tests overnight.

The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.

I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)

The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.

@pcwalton
Copy link
Contributor Author

pcwalton commented Oct 15, 2022

The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.

OK, here's a rough proposed change: pcwalton/llvm-project@87a571a

I changed pointsToConstantMemory() to take a fourth parameter, OrInvariant, like the existing OrLocal, and if that parameter is true then pointsToConstantMemory() returns true for memory that is invariant for the life of the SSA value, even if that memory isn't invariant for the life of the program. I then passed true in cases where it looked safe (particularly getModRefInfo()).

Does this approach seem reasonable? If so I'll continue cleaning it up, adding tests, etc. Having two bool parameters seems a bit ugly, admittedly, but I wanted to follow the existing code.

@nikic
Copy link
Contributor

nikic commented Oct 15, 2022

The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.

@pcwalton
Copy link
Contributor Author

pcwalton commented Oct 15, 2022

The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.

Wait, if we launder pointers every time we enter a function, doesn't that mean that redundant loads can't be eliminated across inlined functions? e.g.

fn f(x: &i32) {
    use_of_val(*x);
    g(x);
    use_of_val(*x); // <--
}

fn g(y: &i32) {
    use_of_ptr(y);
}

If codegen applies a launder intrinsic to y on the entry block of g and g gets inlined into f, then x and y are considered different, despite being the same pointer and immutable throughout, and the load from the second use_of_val() (marked with an arrow above) can't be eliminated because LLVM will think use_of_ptr might have mutated the memory pointed to by x. Is this right?

David Goldblatt suggested an llvm.constify intrinsic (obviously we would need a better name) that takes a pointer and returns a new pointer SSA value that aliases the original but is guaranteed to be invariant for that entire SSA value's lifetime. This would be different from llvm.launder.invariant.group because calling constify on an already constified pointer would be a no-op. To avoid proliferation of llvm.invariant intrinsics I wonder if it would make sense to just replace llvm.invariant.start with this new llvm.constify. What do you think?

@nikic
Copy link
Contributor

nikic commented Oct 16, 2022

The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.

I've given this some thought, and I think the right way to model this is by making the return value a ModRefInfo rather than bool, which is the upper bound on observable memory effects for this location. This is NoModRef for globally invariant memory and Ref for locally invariant memory. Users can then just & this value to refine the ModRefInfo result.

To avoid proliferation of llvm.invariant intrinsics I wonder if it would make sense to just replace llvm.invariant.start with this new llvm.constify. What do you think?

Without commenting on the rest, I don't think llvm.invariant.start can be replaced, because the thing its primarily used for is to declare that the object cannot be modified past a certain point (i.e. invariant.start without invariant.end). In particular this is used to turn globals into constants after global ctor evaluation, in which case having a "constantified" SSA value doesn't really help us -- we need the invariant.start on the global independently of how it will later be used.

@TaKO8Ki
Copy link
Member

TaKO8Ki commented Oct 18, 2022

I think you are better suited to review this PR. r? @nikic

@rust-highfive rust-highfive assigned nikic and unassigned TaKO8Ki Oct 18, 2022
@pcwalton pcwalton marked this pull request as draft October 18, 2022 22:48
@pcwalton
Copy link
Contributor Author

I will close this PR after the relevant LLVM patches are upstream, unless you'd like me to close it now.

@nikic
Copy link
Contributor

nikic commented Oct 19, 2022

I think it's safe to close this, I doubt we would want to pursue the particular approach implemented here.

@pcwalton pcwalton closed this Oct 19, 2022
pcwalton added a commit to pcwalton/rust that referenced this pull request Oct 21, 2022
…adonly` on

indirect immutable freeze by-value function parameters.

Right now, `rustc` only examines function signatures and the platform ABI when
determining the LLVM attributes to apply to parameters. This results in missed
optimizations, because there are some attributes that can be determined via
analysis of the MIR making up the function body. In particular, `readonly`
could be applied to most indirectly-passed by-value function arguments
(specifically, those that are freeze and are observed not to be mutated), but
it currently is not.

This patch introduces the machinery that allows `rustc` to determine those
attributes. It consists of a query, `deduced_param_attrs`, that, when
evaluated, analyzes the MIR of the function to determine supplementary
attributes. The results of this query for each function are written into the
crate metadata so that the deduced parameter attributes can be applied to
cross-crate functions. In this patch, we simply check the parameter for
mutations to determine whether the `readonly` attribute should be applied to
parameters that are indirect immutable freeze by-value.  More attributes could
conceivably be deduced in the future: `nocapture` and `noalias` come to mind.

Adding `readonly` to indirect function parameters where applicable enables some
potential optimizations in LLVM that are discussed in [issue 103103] and [PR
103070] around avoiding stack-to-stack memory copies that appear in functions
like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These
functions pass a large structure unchanged by value to a subfunction that also
doesn't mutate it. Since the structure in this case is passed as an indirect
parameter, it's a pointer from LLVM's perspective. As a result, the
intermediate copy of the structure that our codegen emits could be optimized
away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly
nocapture noalias` in both the caller and callee. We already pass `nocapture
noalias`, but we're missing `readonly`, as we can't determine whether a
by-value parameter is mutated by examining the signature in Rust. I didn't have
much success with having LLVM infer the `readonly` attribute, even with fat
LTO; it seems that deducing it at the MIR level is necessary.

No large benefits should be expected from this optimization *now*; LLVM needs
some changes (discussed in [PR 103070]) to more aggressively use the `noalias
nocapture readonly` combination in its alias analysis. I have some LLVM patches
for these optimizations and have had them looked over. With all the patches
applied locally, I enabled LLVM to remove all the `memcpy`s from the following
code:

```rust
fn main() {
    println!("Hello {}", 3);
}
```

which is a significant codegen improvement over the status quo. I expect that
if this optimization kicks in in multiple places even for such a simple
program, then it will apply to Rust code all over the place.

[issue 103103]: rust-lang#103103

[PR 103070]: rust-lang#103070
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 22, 2022
Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters.

Introduce deduced parameter attributes, and use them for deducing `readonly` on
indirect immutable freeze by-value function parameters.

Right now, `rustc` only examines function signatures and the platform ABI when
determining the LLVM attributes to apply to parameters. This results in missed
optimizations, because there are some attributes that can be determined via
analysis of the MIR making up the function body. In particular, `readonly`
could be applied to most indirectly-passed by-value function arguments
(specifically, those that are freeze and are observed not to be mutated), but
it currently is not.

This patch introduces the machinery that allows `rustc` to determine those
attributes. It consists of a query, `deduced_param_attrs`, that, when
evaluated, analyzes the MIR of the function to determine supplementary
attributes. The results of this query for each function are written into the
crate metadata so that the deduced parameter attributes can be applied to
cross-crate functions. In this patch, we simply check the parameter for
mutations to determine whether the `readonly` attribute should be applied to
parameters that are indirect immutable freeze by-value.  More attributes could
conceivably be deduced in the future: `nocapture` and `noalias` come to mind.

Adding `readonly` to indirect function parameters where applicable enables some
potential optimizations in LLVM that are discussed in [issue 103103] and [PR
103070] around avoiding stack-to-stack memory copies that appear in functions
like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These
functions pass a large structure unchanged by value to a subfunction that also
doesn't mutate it. Since the structure in this case is passed as an indirect
parameter, it's a pointer from LLVM's perspective. As a result, the
intermediate copy of the structure that our codegen emits could be optimized
away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly
nocapture noalias` in both the caller and callee. We already pass `nocapture
noalias`, but we're missing `readonly`, as we can't determine whether a
by-value parameter is mutated by examining the signature in Rust. I didn't have
much success with having LLVM infer the `readonly` attribute, even with fat
LTO; it seems that deducing it at the MIR level is necessary.

No large benefits should be expected from this optimization *now*; LLVM needs
some changes (discussed in [PR 103070]) to more aggressively use the `noalias
nocapture readonly` combination in its alias analysis. I have some LLVM patches
for these optimizations and have had them looked over. With all the patches
applied locally, I enabled LLVM to remove all the `memcpy`s from the following
code:

```rust
fn main() {
    println!("Hello {}", 3);
}
```

which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place.

[issue 103103]: rust-lang#103103

[PR 103070]: rust-lang#103070
@pcwalton
Copy link
Contributor Author

I submitted https://reviews.llvm.org/D136659 to implement the suggested LLVM change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants