Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We are std. #67

Conversation

workingjubilee
Copy link
Contributor

With #66 this will complete the steps in this crate necessary for std to be able to assimilate it.

In doing so, simplify bit_reader_reverse::refill_slow a tad.
@workingjubilee workingjubilee force-pushed the your-crate-will-adapt-to-service-us branch from 74b0634 to 7c4211c Compare August 26, 2024 23:38
@workingjubilee
Copy link
Contributor Author

Stacked the commits since both are necessary anyways and I noticed merge conflicts without them rebased on each other, annoyingly.

@KillingSpark
Copy link
Owner

Hi sorry for the late response, I was on travels. This is very exciting!

Did you benchmark the change in the bitreader? This is one of the hottest code paths in this library. I am not opposed to removing the byteorder library, it's an artifact from way back where from_le_bytes wasn't available yet. But changes in this part of the library need to be benchmarked before they can be merged.

I'll be rather busy catching up the next few days but I am very motivated to get this PR moving forward.

@workingjubilee
Copy link
Contributor Author

workingjubilee commented Sep 1, 2024

Honestly, I was more concerned about correctness, given the nature of the change. That said, having done so, I noticed something interesting.

Nightly

$ rustup default nightly
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.82.0-nightly (0d634185d 2024-08-29)

$ rustup update
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: syncing channel updates for 'beta-x86_64-unknown-linux-gnu'
info: latest update on 2024-08-29, rust version 1.81.0-beta.7 (4a765c0b8 2024-08-28)
#
# snipped
#
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: latest update on 2024-09-01, rust version 1.82.0-nightly (a7399ba69 2024-08-31)
#
# snipped
#
info: syncing channel updates for '1.75-x86_64-unknown-linux-gnu'
info: syncing channel updates for '1.76-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.80.1 (3f5fd8dd4 2024-08-06)
      beta-x86_64-unknown-linux-gnu updated - rustc 1.81.0-beta.7 (4a765c0b8 2024-08-28) (from rustc 1.81.0-beta.5 (6c199c89c 2024-08-15))
   nightly-x86_64-unknown-linux-gnu updated - rustc 1.82.0-nightly (a7399ba69 2024-08-31) (from rustc 1.82.0-nightly (0d634185d 2024-08-29))
    1.75-x86_64-unknown-linux-gnu unchanged - rustc 1.75.0 (82e1608df 2023-12-21)
    1.76-x86_64-unknown-linux-gnu unchanged - rustc 1.76.0 (07dca489a 2024-02-04)

info: cleaning up downloads & tmp directories
info: self-update is disabled for this build of rustup
info: any updates to rustup will need to be fetched with your system package manager

$ cargo clean
     Removed 2502 files, 872.1MiB total
$ cargo bench
   Compiling proc-macro2 v1.0.86
#
# snipped
#
   Compiling criterion v0.5.1
    Finished `bench` profile [optimized] target(s) in 7.04s
     Running unittests src/lib.rs (target/release/deps/ruzstd-a67070fb02525630)

running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored

test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd.rs (target/release/deps/zstd-d2ffb1a8ff286e95)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-03f9974b2b34743b)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-34f9df705d959984)
Gnuplot not found, using plotters backend
reversed bitreader      time:   [2.4965 ms 2.4991 ms 2.5021 ms]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

$ git branch
* master
  remove-byteorder-dep
  your-crate-will-adapt-to-service-us
$ git switch your-crate-will-adapt-to-service-us
Switched to branch 'your-crate-will-adapt-to-service-us'
Your branch is up to date with 'origin/your-crate-will-adapt-to-service-us'.
$ cargo bench
    Updating crates.io index
     Locking 3 packages to latest compatible versions
      Adding compiler_builtins v0.1.124
      Adding rustc-std-workspace-alloc v1.0.0
      Adding rustc-std-workspace-core v1.0.0
   Compiling ruzstd v0.7.1 (/home/jubilee/rust/ruzstd)
    Finished `bench` profile [optimized] target(s) in 1.45s
     Running unittests src/lib.rs (target/release/deps/ruzstd-7ff6417e4589357f)

running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored

test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd.rs (target/release/deps/zstd-e735dd04ebb08a91)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-cddb555eb1629f42)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-2e0563546441c387)
Gnuplot not found, using plotters backend
reversed bitreader      time:   [2.3927 ms 2.3950 ms 2.3979 ms]
                        change: [-4.3128% -4.1639% -4.0105%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

$ 

Stable

$ git switch master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
$ cargo clean
     Removed 596 files, 138.9MiB total
$ rustup default stable
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.80.1 (3f5fd8dd4 2024-08-06)

$ cargo bench
   Compiling proc-macro2 v1.0.86
#
# snipped
#
   Compiling criterion v0.5.1
    Finished `bench` profile [optimized] target(s) in 7.30s
     Running unittests src/lib.rs (target/release/deps/ruzstd-03704bf2cd47f473)

running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored

test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd.rs (target/release/deps/zstd-3bb7de58025fde57)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-795faa6cc7116ab9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-7685c61267242384)
Gnuplot not found, using plotters backend
reversed bitreader      time:   [2.4838 ms 2.4878 ms 2.4916 ms]

$ git switch your-crate-will-adapt-to-service-us
Switched to branch 'your-crate-will-adapt-to-service-us'
Your branch is up to date with 'origin/your-crate-will-adapt-to-service-us'.
$ cargo bench
    Updating crates.io index
     Locking 3 packages to latest compatible versions
      Adding compiler_builtins v0.1.124
      Adding rustc-std-workspace-alloc v1.0.0
      Adding rustc-std-workspace-core v1.0.0
   Compiling ruzstd v0.7.1 (/home/jubilee/rust/ruzstd)
    Finished `bench` profile [optimized] target(s) in 1.57s
     Running unittests src/lib.rs (target/release/deps/ruzstd-7bbc3b5a0fd4b443)

running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored

test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd.rs (target/release/deps/zstd-4e597d9467301f4a)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-2914c4f1bf4a1210)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-6f83a5878c30f49b)
Gnuplot not found, using plotters backend
reversed bitreader      time:   [2.6718 ms 2.6734 ms 2.6753 ms]
                        change: [+7.2807% +7.4595% +7.6427%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

$

Conclusion

I did run cargo bench a few more times on both and got similar-ish results, though every now and then criterion would say performance has improved despite it being a no-change. Either this benchmark is too subject to variance for a quick run of criterion to tell us anything, or this is liable to be an improvement on nightly and a regression on stble.

I'd be happy to investigate further performance improvements, but I am inherently reluctant to make strong, general conclusions based on a single bench target, and I don't want to compare my results against e.g. a generic "decode enwik9.zstd" directive. I either need to be able to download a precise file, or have a set of reconstructible steps that can be referred back to later, so that e.g. things like the encoding compression level can also be controlled for.

With everything precisely reconstructible, I'd be able to actually examine improvements with metrics that aren't likely to be strongly affected by things like e.g. I ran the bench on a CPU with a 100 MB cache... or just be confident that I'm actually running the same bench if I boot a different CPU with a different OS.

@KillingSpark
Copy link
Owner

Obviously I'll have a very good look at the changes for correctness as well, but even if correct the changes need to not regress the performance further.

As for the benchmarks: I should have been more precise, the single benchmark included in this repo is admittedly not very good. I am mostly using enwik9 because it is highly compressible. The actual compression level does not really matter though, zstd will generate a lot of offset/matchlength/literallength triplets on every level, precisely because enwik9 is so easly compressable. Decoding these triplets is what stresses the bitreader.

What I'm usually doing to benchmark changes like this is to build the binaries in the repo with --release and then call something along the lines of

time ./target/release/zstd -c -d path/to/archive.zstd > /dev/null

I am aware that that also measures a lot of stuff that isn't the bitreader but writing a benchmark that simulates a realistic usage pattern isn't trivial (and I also don't want to include a copy of enwik9 in this repo because that's pretty big). The bitreader does a significant (~40%) amount of work in this measurement and changes in performance are noticable even if there is some noise in there.

@workingjubilee
Copy link
Contributor Author

As for the benchmarks: I should have been more precise, the single benchmark included in this repo is admittedly not very good. I am mostly using enwik9 because it is highly compressible. The actual compression level does not really matter though, zstd will generate a lot of offset/matchlength/literallength triplets on every level, precisely because enwik9 is so easly compressable. Decoding these triplets is what stresses the bitreader.

to be clear, I don't really care if something "doesn't matter" to you, because that means I have to guess how exactly you assembled the enwik9.zstd candidate text file.

@workingjubilee
Copy link
Contributor Author

workingjubilee commented Sep 2, 2024

Multiple sources can claim to offer a download of the "enwik9" dataset, and yet offer slightly varying datasets. It doesn't have to be in the repo: a URL would be sufficient to disambiguate.

Multiple commands issued to a zstd encoder can produce different .zstd files. These are variables. They should be controlled. To do otherwise is to not perform an experiment.

@workingjubilee
Copy link
Contributor Author

The compressor cannot be this crate as it does not implement compression yet, so it has to be a binary, and that binary has a version. If I install a version from my distro, that version may actually have arbitrary patches that severely alter its behavior. I know this because distros regularly patch the Rust compiler to alter its behavior in ways that make it unsound. So I also would prefer an unambiguous version of the compressor.

@workingjubilee
Copy link
Contributor Author

compression

$ zstd -9 enwik9 -o enwik9.zstd
enwik9               : 27.23%   (   954 MiB =>    260 MiB, enwik9.zstd)        

decompression

master

$ time ./target/release/zstd -c -d enwik9.zstd > /dev/null
File: enwik9.zstd
100 % done
Decoded frames: 1  bytes: 1000000000
1 of 1 checksums are ok!

real    0m2.365s
user    0m2.320s
sys     0m0.033s

we are std.

$ time ./target/release/zstd -c -d enwik9.zstd > /dev/null
File: enwik9.zstd
100 % done
Decoded frames: 1  bytes: 1000000000
1 of 1 checksums are ok!

real    0m2.364s
user    0m2.324s
sys     0m0.030s

@KillingSpark
Copy link
Owner

These are variables. They should be controlled. To do otherwise is to not perform an experiment.

I'm not sure I understand the concerns you have. The biggest variable in benchmarking this crate is the hardware used to run the benchmark. The CPU, compressor/compression level and underlying data don't really matter as long as they represent a somewhat realistic scenario that stresses the bitreader. Since you are not comparing against any benchmarks I have made (because you obviously do not have the hardware I use for benchmarking, because that hardware is sitting under my desk) but comparing runs of different implementations on your machine with these variables controlled by yourself, this is a perfectly valid experiment. Which choices you make for these variables doesn't really matter, not just to me, it literally just don't matter, as long as you keep them consistent for the runs you are comparing. Otherwise, yes, that wouldn't be an experiment but it would also be just nonsense...

Thanks for running the benchmark, this looks promising.

@KillingSpark
Copy link
Owner

Hi I checked out the code, the changes to the bitreader look good, not doing the evening out is actually an improvement. This was just done because the byteorder library did not offer odd sized reads.

On my machine the change in the hot path is a slight performance regression but it's acceptable for the purposes of moving this PR along. If you could alter the github CI to do this, that would be great. The rustc-dep-of-std feature breaks the build on stable.

cargo hack check --feature-powerset --exclude-features rustc-dep-of-std

@KillingSpark
Copy link
Owner

KillingSpark commented Sep 6, 2024

Nevermind, I just saw that clippy is also complaining in unrelated code, I'll have to touch this anyways to make the CI green...

@KillingSpark KillingSpark merged commit 3b77750 into KillingSpark:master Sep 6, 2024
0 of 2 checks passed
@workingjubilee
Copy link
Contributor Author

Thank you!

@workingjubilee workingjubilee deleted the your-crate-will-adapt-to-service-us branch September 6, 2024 18:17
@KillingSpark
Copy link
Owner

Ah one more question, do I need to keep anything in mind maintaining the special dependencies this crate has now with the rustc-dep-of-std feature? Or is this a "don't touch and it will be fine" situation?

@workingjubilee
Copy link
Contributor Author

@KillingSpark As long as nothing happens that introduces a direct dependency on std (which breaks things via recursive dependence), things should be good.

@khuey
Copy link

khuey commented Sep 10, 2024

Apologies if this has already been requested but could you cut an ruzstd release with this PR? This is the last thing blocking supporting zstd-compressed debug sections in backtrace (rust-lang/backtrace-rs#626).

@KillingSpark
Copy link
Owner

Oh sorry I wasn't aware that was time critical. I'll make a release this evening

@khuey
Copy link

khuey commented Sep 13, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants