rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default. #103070

pcwalton · 2022-10-14T23:26:04Z

Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of & references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases.

We're already using LLVM's readonly parameter attribute on such calls. However, the semantics of readonly are akin to const in C++, in that they only promise to LLVM that the function won't mutate the parameter through that pointer, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability of readonly to LLVM's alias analysis. Instead of readonly, the correct way to express strong immutability guarantees on memory is through the llvm.invariant.start and llvm.invariant.end intrinsics. These enable a frontend like rustc to describe immutability of memory regions in an expressive, flow-sensitive manner.

Unfortunately, LLVM doesn't use the llvm.invariant.start and llvm.invariant.end intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph.

Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes rustc use llvm.invariant in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default -Z emit-invariant-markers=yes flag.

Obviously, a lot more can be done to use llvm.invariant more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of more llvm.invariant use can be measured more precisely.

rustbot · 2022-10-14T23:26:07Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

rust-highfive · 2022-10-14T23:26:07Z

r? @TaKO8Ki

(rust-highfive has picked a reviewer for you, use r? to override)

pcwalton · 2022-10-15T00:03:31Z

Another benefit of the conservatism here is that it avoids this PR depending too much on the semantics of unsafe code. Memory pointed to by immutable function parameter references to freezable memory is clearly immutable for the lifetime of the function regardless of what semantics we decide on for unsafe code; if they weren't it would very probably be UB already given that we mark such parameters as readonly right now.

immutable references; off by default. Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of `&` references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases. We're already using LLVM's `readonly` parameter attribute on such calls. However, the semantics of `readonly` are akin to `const` in C++, in that they only promise to LLVM that the function won't mutate the parameter *through that pointer*, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability of `readonly` to LLVM's alias analysis. Instead of `readonly`, the correct way to express strong immutability guarantees on memory is through the `llvm.invariant.start` and `llvm.invariant.end` intrinsics. These enable a frontend like `rustc` to describe immutability of memory regions in an expressive, flow-sensitive manner. Unfortunately, LLVM doesn't use the `llvm.invariant.start` and `llvm.invariant.end` intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph. Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes `rustc` use `llvm.invariant` in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default `-Z emit-invariant-markers=yes` flag. Obviously, a lot more can be done to use `llvm.invariant` more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of more `llvm.invariant` use can be measured more precisely.

nikic · 2022-10-15T08:44:41Z

I'd suggest doing a perf run with this enabled by default.

TBH, I'm not sure this is a viable way forward. I expect that this is going to regress optimization potential in practice, because invariant.start is essentially unused for optimization purposes (the only thing clang uses it for is to convert globals into constants after evaluating global ctors). At the same time, invariant.start will introduce a ref effect and invariant.end a modref effect, and from a quick look it doesn't seem like code is able to look past these effectively. I think it's unlikely that use of invariant.start and invariant.end for optimization in LLVM will increase much in the future, because their control-flow dependence would make this expensive in terms of compile-time.

The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.

However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.

nikic · 2022-10-15T08:50:53Z

Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.

pcwalton · 2022-10-15T09:16:57Z

Actually, now that I think about it again: Doesn't the combination of noalias and readonly that we usually have already imply invariance within the function? I think AA should be able to handle that without any changes on the rustc side.

No, it doesn't. See: https://godbolt.org/z/Wxn3WWWfa

LLVM doesn't know that bar() doesn't mutate *x somehow, despite it being marked readonly and noalias in all functions. So it has to issue a needless reload of the value.

However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an invariant argument attribute. This will not negatively affect any optimizations and should be easy to handle directly on the AA level (by generalizing pointsToConstantMemory towards classifying memory into globally invariant and locally invariant, where the former results in no effects, while the latter is limited to ref effects). This has been on my backlog for a while, but I never got around to implementing it.

I'd like to be able to use this on references that aren't directly function parameters. e.g.

struct Foo<'a> {
    bar: &'a i32,
}

fn f(foo: &Foo) {
    let bar = foo.bar;
    g(*bar);
    h(bar);
    g(*bar);
}

should be optimized to avoid the redundant *bar load. This seems to require more than just parameter metadata.

nikic · 2022-10-15T09:24:12Z

LLVM doesn't know that bar() doesn't mutate *x somehow, despite it being marked readonly and noalias in all functions. So it has to issue a needless reload of the value.

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

pcwalton · 2022-10-15T10:24:33Z

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory() and it seems to work on a simple test case. Running tests overnight.

pcwalton · 2022-10-15T10:35:08Z

The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using !invariant.group. This representation is used for strict vtables in clang and does have non-trivial support in LLVM.

I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)

nikic · 2022-10-15T11:01:46Z

I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side.

OK, I've added that check to BasicAliasAnalysis::pointsToConstantMemory() and it seems to work on a simple test case. Running tests overnight.

The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.

I can do this, but do we need to launder the pointer first? The LangRef suggests that invariant.group should work as long as every use of the SSA value loads the same value. Since in these cases the memory is immutable for the entire lifetime of the SSA value that seems safe. (Though that assumes the inliner correctly inserts launder intrinsics when inlining functions.)

The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.

pcwalton · 2022-10-15T15:46:01Z

The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.

OK, here's a rough proposed change: pcwalton/llvm-project@87a571a

I changed pointsToConstantMemory() to take a fourth parameter, OrInvariant, like the existing OrLocal, and if that parameter is true then pointsToConstantMemory() returns true for memory that is invariant for the life of the SSA value, even if that memory isn't invariant for the life of the program. I then passed true in cases where it looked safe (particularly getModRefInfo()).

Does this approach seem reasonable? If so I'll continue cleaning it up, adding tests, etc. Having two bool parameters seems a bit ugly, admittedly, but I wanted to follow the existing code.

nikic · 2022-10-15T16:04:28Z

The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.

pcwalton · 2022-10-15T21:13:48Z

The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly.

Wait, if we launder pointers every time we enter a function, doesn't that mean that redundant loads can't be eliminated across inlined functions? e.g.

fn f(x: &i32) {
    use_of_val(*x);
    g(x);
    use_of_val(*x); // <--
}

fn g(y: &i32) {
    use_of_ptr(y);
}

If codegen applies a launder intrinsic to y on the entry block of g and g gets inlined into f, then x and y are considered different, despite being the same pointer and immutable throughout, and the load from the second use_of_val() (marked with an arrow above) can't be eliminated because LLVM will think use_of_ptr might have mutated the memory pointed to by x. Is this right?

David Goldblatt suggested an llvm.constify intrinsic (obviously we would need a better name) that takes a pointer and returns a new pointer SSA value that aliases the original but is guaranteed to be invariant for that entire SSA value's lifetime. This would be different from llvm.launder.invariant.group because calling constify on an already constified pointer would be a no-op. To avoid proliferation of llvm.invariant intrinsics I wonder if it would make sense to just replace llvm.invariant.start with this new llvm.constify. What do you think?

nikic · 2022-10-16T20:48:07Z

The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review.

I've given this some thought, and I think the right way to model this is by making the return value a ModRefInfo rather than bool, which is the upper bound on observable memory effects for this location. This is NoModRef for globally invariant memory and Ref for locally invariant memory. Users can then just & this value to refine the ModRefInfo result.

To avoid proliferation of llvm.invariant intrinsics I wonder if it would make sense to just replace llvm.invariant.start with this new llvm.constify. What do you think?

Without commenting on the rest, I don't think llvm.invariant.start can be replaced, because the thing its primarily used for is to declare that the object cannot be modified past a certain point (i.e. invariant.start without invariant.end). In particular this is used to turn globals into constants after global ctor evaluation, in which case having a "constantified" SSA value doesn't really help us -- we need the invariant.start on the global independently of how it will later be used.

TaKO8Ki · 2022-10-18T08:51:14Z

I think you are better suited to review this PR. r? @nikic

pcwalton · 2022-10-18T23:58:43Z

I will close this PR after the relevant LLVM patches are upstream, unless you'd like me to close it now.

nikic · 2022-10-19T07:42:02Z

I think it's safe to close this, I doubt we would want to pursue the particular approach implemented here.

…adonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070

Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070

pcwalton · 2022-10-25T03:55:05Z

I submitted https://reviews.llvm.org/D136659 to implement the suggested LLVM change.

rust-highfive assigned TaKO8Ki Oct 14, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 14, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 14, 2022

pcwalton force-pushed the llvm-invariant branch 2 times, most recently from d65e579 to 8997e72 Compare October 14, 2022 23:28

pcwalton force-pushed the llvm-invariant branch from 8997e72 to 5327e78 Compare October 15, 2022 00:18

pcwalton mentioned this pull request Oct 18, 2022

Introduce deduced parameter attributes, and use them for deducing readonly on indirect immutable freeze by-value function parameters. #103172

Merged

rust-highfive assigned nikic and unassigned TaKO8Ki Oct 18, 2022

pcwalton marked this pull request as draft October 18, 2022 22:48

pcwalton closed this Oct 19, 2022

giuseros mentioned this pull request Sep 26, 2023

Grab-bag of improvements to the IR we send to LLVM ROCm/rocMLIR#1243

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default. #103070

rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default. #103070

pcwalton commented Oct 14, 2022

rustbot commented Oct 14, 2022

rust-highfive commented Oct 14, 2022

pcwalton commented Oct 15, 2022 •

edited

Loading

nikic commented Oct 15, 2022

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 •

edited

Loading

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 •

edited

Loading

pcwalton commented Oct 15, 2022

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 •

edited

Loading

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 •

edited

Loading

nikic commented Oct 16, 2022

TaKO8Ki commented Oct 18, 2022

pcwalton commented Oct 18, 2022

nikic commented Oct 19, 2022

pcwalton commented Oct 25, 2022

rustc_codegen_ssa: Use llvm.invariant intrinsics on arguments that are immutable references; off by default. #103070

rustc_codegen_ssa: Use llvm.invariant intrinsics on arguments that are immutable references; off by default. #103070

Conversation

pcwalton commented Oct 14, 2022

rustbot commented Oct 14, 2022

rust-highfive commented Oct 14, 2022

pcwalton commented Oct 15, 2022 • edited Loading

nikic commented Oct 15, 2022

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 • edited Loading

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 • edited Loading

pcwalton commented Oct 15, 2022

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 • edited Loading

nikic commented Oct 15, 2022

pcwalton commented Oct 15, 2022 • edited Loading

nikic commented Oct 16, 2022

TaKO8Ki commented Oct 18, 2022

pcwalton commented Oct 18, 2022

nikic commented Oct 19, 2022

pcwalton commented Oct 25, 2022

rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default. #103070

rustc_codegen_ssa: Use `llvm.invariant` intrinsics on arguments that are immutable references; off by default. #103070

pcwalton commented Oct 15, 2022 •

edited

Loading

pcwalton commented Oct 15, 2022 •

edited

Loading

pcwalton commented Oct 15, 2022 •

edited

Loading

pcwalton commented Oct 15, 2022 •

edited

Loading

pcwalton commented Oct 15, 2022 •

edited

Loading