Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strict provenance: rename addr → bare_addr #121588

Closed
wants to merge 1 commit into from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Feb 25, 2024

Many people are concerned that ptr.addr() is too tempting of a name: people might just replace all ptr as usize cast with ptr.addr() and not change anything else about their code, and that would be bad as it can introduce UB. The name bare_addr is meant to nudge people towards reading the docs which hopefully avoids such mistakes.

@rust-lang/libs when you are discussing this -- we'd really like feedback on naming preferences here. I first suggested another name, addr_without_provenance, but people were generally unhappy with that. A lot would prefer addr. This is almost always what you should use, but the issue is that to use it you have to change more than one thing in your code -- you have to replace the int2ptr casts by something entirely different, like with_addr/map_addr. We can steer people towards using addr by giving it a nice and short name, but we cannot use API design alone to steer people towards then also doing the right thing at the other end of the cast. (I think of these casts as coming in pairs: somewhere a ptr is cast to an int, and later an int is cast back to a ptr.)

So bare_addr is a compromise between these concerns. If we go for this we should probably rename ptr::without_provenance to ptr::from_bare_addr or ptr::from_bare or ptr::bare or something like that. (from_bare is a bit strange since this is converting from a usize and usizes are always bare integers... this would be in contrast to from_exposed which also takes a bare usize but then does "magic" to equip it with suitable provenance, at least in some cases.) I'm happy to prepare that PR, but if t-libs-api has opinions on that name they'd be good to hear. :)

@rustbot
Copy link
Collaborator

rustbot commented Feb 25, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-SGX Target: SGX O-unix Operating system: Unix-like O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 25, 2024
@RalfJung RalfJung added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed O-windows Operating system: Windows T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. O-SGX Target: SGX O-unix Operating system: Unix-like labels Feb 25, 2024
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added O-SGX Target: SGX O-unix Operating system: Unix-like O-windows Operating system: Windows labels Feb 25, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 25, 2024

The Miri subtree was changed

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 25, 2024

Portable SIMD is developed in its own repository. If possible, consider making this change to rust-lang/portable-simd instead.

cc @calebzulawski, @programmerjake

@saethlin
Copy link
Member

The naming has been discussed heavily in this Zulip topic, and I'd prefer to not religitate the points made there in a forum which is worse for conversation (GitHub's automatic hiding of comments is horrendous for long conversations): https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Next.20steps.20for.20strict.20provenance.20stabilization

What someone could do that would be very helpful is post a summary of the arguments made in that Zulip thread and add it to the PR description.

I'm saying this because there is a good argument made against @joboet's point above, and their comment seems unaware of it. We should be providing new entrants an easy onboarding to the conversation, and we aren't.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 26, 2024

This particular argument actually made it into the PR summary, albeit in very abbreviated form:

people will reach for addr over as usize casts without reading the docs, which would lead to UB if they ever intend to cast this back to a pointer and then use that pointer

In other words, blindly replacing ptr as usize with ptr.addr() and not changing anything else will often lead to UB. The lossy_provenance_casts lint (allow-by-default) will even tell you to do that. Of course it will also warn about the cast back, but if that happens in a different crate a different part of the crate where the lint is not enabled or you ignore that occurrence of the lint, then there's a problem.

That's why the lint now adds some extra caution:

error: under strict provenance it is considered bad style to cast pointer `*const u8` to integer `usize`
  --> $DIR/lint-strict-provenance-lossy-casts.rs:14:20
   |
LL |     let ptr_addr = ptr as usize;
   |                    ^^^^^^^^^^^^
   |
   = help: if you need to cast the address back to a pointer later, use `.expose_addr()` instead
help: use `.addr_without_provenance()` to obtain the address of a pointer without its provenance -- but note that this cannot be cast back to a pointer later; you need to use `with_addr` instead
   |
LL |     let ptr_addr = ptr.addr_without_provenance();
   |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~

@cuviper
Copy link
Member

cuviper commented Mar 1, 2024

r? libs-api

@rustbot rustbot assigned m-ou-se and unassigned cuviper Mar 1, 2024
@RalfJung RalfJung changed the title strict provenance: rename addr → addr_without_provenance strict provenance: rename addr → bare_addr Mar 3, 2024
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 10, 2024

☔ The latest upstream changes (presumably #121662) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 12, 2024

We discussed this in last week's libs-api meeting, but we did not reach a consensus among those present. Some of us thought the name addr was fine as is. Not all of us were convinced that adding bare_ makes any difference. We also weren't entirely sure what the target audience for this change is. (Those who don't already know about provenance, or those that do already know about provenance?) I'll follow up with a comment with my own opinion.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 12, 2024

My opinion is similar to what @joboet expressed in this comment.

I'm personally not convinced that addr and expose_addr need to match up symetrically with without_provenance and from_exposed_addr.

My mental model if a pointer is basically a pair (address, provenance), so having .addr() to get the address part seems fine to me.

ptr::without_provenance makes sense because a pointer carries a provenance, so having a null/empty provenance is a property of the pointer you're creating. In the other direction, however, something like addr_without_provenance() -> usize seems a bit weird to me, since a usize does not carry a provenance by definition. The alternative is not to create an address (usize) with provenance. The alternative is to expose the address before getting the address as usize (without any provenance attached, since it is a usize).

Having clear documentation for from_exposed_addr seems enough to me. Even the name already makes it clear something is up with the kind of address it takes. Just like how creating and offsetting pointers is safe and only dereferencing is unsafe, it makes sense to me for this to also be asymetrical: addr() with a simple name and from_exposed_addr() with a verbose name that serves as a warning.

Separately from all that, I don't think bare_addr makes things any clearer than just addr. It doesn't feel like a warning (like unchecked_ for example), and "bare" is probably not something that rings a bell for anyone who doesn't already know about provenance.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 12, 2024

We also weren't entirely sure what the target audience for this change is. (Those who don't already know about provenance, or those that do already know about provenance?)

Specifically, David wondered whether the goal of the rename is:

  • an intentional stumbling block for people who don't know about provenance to learn about provenance; or
  • a periodic reminder to people who already know about provenance that on this particular line of code, they need to have it in mind.

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Mar 12, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Mar 12, 2024

I'm just channeling other people that were uncomfortable with addr. Personally i barely care about the name, but sadly I have to play messenger here or else it seems this will never get stabilized. :/

I gave the arguments against addr in the PR description, I am curious what your reply is to them?

Separately from all that, I don't think bare_addr makes things any clearer than just addr. It doesn't feel like a warning (like unchecked_ for example), and "bare" is probably not something that rings a bell for anyone who doesn't already know about provenance.

The idea is that people will see the "bare" an go "huh, why does it say bare? Let me read the docs...".

Specifically, David wondered whether the goal of the rename is:

  • an intentional stumbling block for people who don't know about provenance to learn about provenance; or

This.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 12, 2024

If we go for this we should probably rename ptr::without_provenance to ptr::from_bare_addr or ptr::from_bare or ptr::bare or something like that.

That sounds like a bad idea tot me. from_bare_addr or from_bare doesn't make it clear that the resulting pointer will have no provenance, only that the address used as input is "bare" (without provenance). But the input is a usize, so it doesn't have provenance by definition.

In other words: the way I see it, the "without provenance" in ptr::without_provenance is a property of the pointer (output), not of the usize (input).

@RalfJung
Copy link
Member Author

RalfJung commented Mar 12, 2024

That sounds like a bad idea tot me. from_bare_addr or from_bare doesn't make it clear that the resulting pointer will have no provenance, only that the address used as input is "bare" (without provenance). But the input is a usize, so it doesn't have provenance by definition.

In other words: the way I see it, the "without provenance" in ptr::without_provenance is a property of the pointer (output), not of the usize (input).

Indeed, that's why I proposed ptr::bare. We could use the term "bare pointer" for a "pointer without provenance", that could be a useful adjective to have.

@scottmcm
Copy link
Member

I rather like the direction of "bare" pointer. It's a word that doesn't have a common definition already, so we have have a section talking about it in the ptr docs or something. Using a different word with a more established meaning, like dangling, would I think risk conflating it with "thing that was freed before" and such.

And it's nice and short :)


Let me see if I can more directly reflect the addr concern.

I would love for it to be just addr, but I'm concerned by people just changing as to addr. One of the problems I see is that there isn't any way in LLVM to do addr differently from expose_addr (https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/Non-exposing.20.60ptrtoint.60.3F/near/425028219). And that means that it's really easy to say "well look, it always works", and the only counter-examples require running MIRI.

If we had a different way, like how ptr::invalid now is different from as *const _, so we could have a code example that actually ever fails when running -- like all the great provenance bug examples Ralf has in blog posts -- to put in stabilization release notes or something, then I'd feel more comfortable.

Otherwise I just worry that it'll be de facto impossible to change to something different from exposing, because we'd likely see a ton of crater failing on it 🙁

If we have something for people to use that's a method instead of an as who still want the as semantics, that would potentially also help.

Even if it's just .as_usize() and .addr(), at least that means there's a way to say "no, you meant the other one" instead of people just going "I don't like as so I'll change to .addr()".

@RalfJung
Copy link
Member Author

I would love for it to be just addr, but I'm concerned by people just changing as to addr. One of the problems I see is that there isn't any way in LLVM to do addr differently from expose_addr

Well, we could store the pointer in memory and then load it back as an i64; or do a transmute via union. But I doubt either of these is a good idea.^^

If we have something for people to use that's a method instead of an as who still want the as semantics, that would potentially also help.

Even if it's just .as_usize() and .addr(), at least that means there's a way to say "no, you meant the other one" instead of people just going "I don't like as so I'll change to .addr()".

Okay so if we were to stabilize some form of expose_addr/from_exposed_addr together with addr, that would alleviate your concern? This was recently discussed again here. I remain firm that we can't stabilize from_exposed_addr with its current docs. But maybe we can write docs that essentially say "look, we'd prefer if you didn't use this, but if you must, go ahead and here is some advice". We could say that there must be some previously exposed provenance that works for this pointer, and there may be further requirements that we still have to figure out, like that provenance being the most recently exposed provenance for that address, or maybe it is at least one of the provenances previously exposed on the same address, or at least on an address in the same allocation (and the provenance must be associated with that allocation).

@scottmcm
Copy link
Member

Well, we could store the pointer in memory and then load it back as an i64; or do a transmute via union. But I doubt either of these is a good idea.^^

Unfortunately, even just at opt-level=1 both of those are ptrtoint to LLVM: https://rust.godbolt.org/z/88oEhhv6d

Right now it apparently just doesn't have a way to express this difference.

(Maybe ptrsub ptr %p, ptr null one day, as non-exposing?)

@scottmcm
Copy link
Member

Okay so if we were to stabilize some form of expose_addr/from_exposed_addr together with addr, that would alleviate your concern? This was recently discussed again here. I remain firm that we can't stabilize from_exposed_addr with its current docs.

Yes. Even if we just stabilize .ptr_as_usize() or something documented as "the same as as usize" to be non-commital about it, and then potentially still have .expose_addr() later once we figure that out better.

So long as there's something to help keep everyone who sees the release notes from going "oh, I'll replace all my ass with .addr() then!"

(Of course, I'm biased because I also want something like that which we can put in a lint suggestion to tell people to move off as casts for this, to help avoid the p as u64 vs *p as u64 mistakes.)

@workingjubilee
Copy link
Member

Honestly, I think pointer::addr is fine. I think people do in fact want it most of the time. I agree that we'd want something that is an exposing cast, and I think the issues go away if we simply stabilize a parallel exposing name, whatever it is.

@RalfJung
Copy link
Member Author

Okay, let's stick to addr() then.

@RalfJung RalfJung closed this Mar 23, 2024
@dtolnay dtolnay removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Mar 24, 2024
@RalfJung RalfJung deleted the addr-without-provenance branch August 28, 2024 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-SGX Target: SGX O-unix Operating system: Unix-like O-windows Operating system: Windows S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.