Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an env var to artificially limit the stack size #941

Merged
merged 3 commits into from
Jan 19, 2024

Conversation

konstin
Copy link
Member

@konstin konstin commented Jan 16, 2024

By default, windows has a stack size limit of 1MB which we run against in debug without any explicit culprit. A new environment variable PUFFIN_STACK_SIZE allows setting an artificially smaller stack size.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create an issue for looking into this? The thing I'm worried about here is that we have 1MB+ stacks lying around, and that probably isn't a great thing. With that said, if this fixes things on Windows I'm fine with it as a temporary work-around I think.

tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.expect("Failed building the Runtime")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokio docs don't make it clear under what conditions this routine fails, but it returns an io::Error on failure. I do wonder under what conditions this can fail, and perhaps it might be better to not panic? On the other hand, if unwrap() is what #[tokio::main] was already doing, then I guess it's probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[tokio::main]
async fn main() {
    println!("Hello, world!");
}

expands to

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2021::*;
#[macro_use]
extern crate std;
fn main() {
    let body = async {
        {
            ::std::io::_print(format_args!("Hello, world!\n"));
        };
    };
    #[allow(clippy::expect_used, clippy::diverging_sub_expression)]
    {
        return tokio::runtime::Builder::new_multi_thread()
            .enable_all()
            .build()
            .expect("Failed building the Runtime")
            .block_on(body);
    }
}

so i think that's what you're expected to do.

@konstin konstin marked this pull request as draft January 17, 2024 12:49
@konstin konstin force-pushed the konsti/set-explicit-thread-size branch from 5587b79 to 56ac1a8 Compare January 18, 2024 10:19
@konstin konstin changed the title Set explicit stack size to avoid stack overflows on windows Add a feature to simulate a smaller stack size Jan 18, 2024
@konstin konstin force-pushed the konsti/set-explicit-thread-size branch from 56ac1a8 to c3242bb Compare January 18, 2024 11:12
@konstin konstin marked this pull request as ready for review January 18, 2024 11:13
konstin added a commit that referenced this pull request Jan 18, 2024
Features can change behaviour, e.g. the pyo3 feature setting linker options and #941 setting the stack size. Instead of running tests with all features, we list the features required for tests.

The old command
```
cargo nextest run --all --all-features --status-level skip --failure-output immediate-final --no-fail-fast -j 12
```
and the new command
```
cargo nextest run --all --features all-tests --status-level skip --failure-output immediate-final --no-fail-fast -j 12
```
run the same tests.

This includes a fix for the pep508_rs `evaluate_extras_and_python_version` doc test, which is only run by `cargo test` but not by `cargo nextest`.

This is required for #941.
@konstin konstin mentioned this pull request Jan 18, 2024
@konstin konstin force-pushed the konsti/set-explicit-thread-size branch from c3242bb to 7b80943 Compare January 18, 2024 11:45
@konstin konstin changed the base branch from main to konsti/test-features January 18, 2024 11:46
@charliermarsh
Copy link
Member

@konstin - Do you only see these failures in debug, or release too?

@konstin
Copy link
Member Author

konstin commented Jan 18, 2024

Only in debug mode

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I like it.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, given #963, I now realize I'm less a fan of this approach. It'd be really nice to make --all-features continue to work. Otherwise, it breaks things like this. Requiring a non-standard way to enable all features just so we can test with a smaller stack size seems non-ideal to me.

What do you think about configuring this with an environment variable instead?

Base automatically changed from konsti/test-features to main January 18, 2024 14:24
@konstin konstin force-pushed the konsti/set-explicit-thread-size branch from 7b80943 to 5a63635 Compare January 18, 2024 14:28
@konstin konstin changed the title Add a feature to simulate a smaller stack size Add an env var to artificially limit the stack size Jan 18, 2024
.enable_all()
.build()
.expect("Failed building the Runtime")
.block_on(inner())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@konstin konstin force-pushed the konsti/set-explicit-thread-size branch 5 times, most recently from 52878f7 to 1bb4eb7 Compare January 18, 2024 16:39
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

@konstin konstin force-pushed the konsti/set-explicit-thread-size branch from 1bb4eb7 to 1dd47b9 Compare January 19, 2024 09:28
@konstin konstin enabled auto-merge (squash) January 19, 2024 09:28
…e futures (#947)

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
#941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
@konstin konstin disabled auto-merge January 19, 2024 09:29
@konstin konstin enabled auto-merge (squash) January 19, 2024 09:30
@konstin konstin merged commit 66e6519 into main Jan 19, 2024
3 checks passed
@konstin konstin deleted the konsti/set-explicit-thread-size branch January 19, 2024 09:34
konstin added a commit that referenced this pull request Jan 19, 2024
…e futures (#1004)

This is #947 again but this time
merging into main instead of downstack, sorry for the noise.

---

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
#941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants