Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shuf: Fix OOM crash for huge number ranges #5980

Merged
merged 3 commits into from
Feb 25, 2024

Conversation

BenWiederhake
Copy link
Collaborator

gnu_shuf -n10 -r -i0-4294967295 will run perfectly fine on virtually all platforms, and allocates only 2 MiB.
uushuf -n10 -r -i0-4294967295 immediately OOMs with memory allocation of 103079215104 bytes failed, and probably needs at least 137 GiB of RAM and several hours if not weeks to actually work.

Differences with GNU are treated as bugs.

This PR fixes this crash bug, by treating number ranges specially:

  • The first commit extracts a common interface between Vec<&[u8]> and the "number range" concepts.
  • The second commit implements and tests a special implementation for the number range case.
  • The third commit adds some documentation.

The "special implementation" is a really stupid approach. I'm sure this can be optimized further (e.g. bitsets?), but first let's get this working at all. Also note that this will fix a failure in one of the GNU shuf tests.

Furthermore, observe how much faster it becomes even for "small" numbers, here running {shuf} -n5 -i1-{upper_limit_like_50k}:

the range 1k to 100k

Raw `data.csv` of this graph, also zoomed out

the range 1k to 500k

1000,0.272365,0.24319744253057665,0.3176925,0.3875586935583776,0.500758,0.41346400000000005,0.3957279915053639,0.46854100000000004,0.5375989723907768,0.7125100000000001,0.363566,0.3392808250610892,0.41423400000000005,0.48483559284437167,0.6130350000000001
2000,0.274465,0.24164537174005973,0.315623,0.39188568086606146,0.6250110000000001,0.4675770000000001,0.44664010661395964,0.5237430000000001,0.60115189977922,0.8442350000000001,0.361715,0.3403948256359187,0.41513100000000003,0.48389609310371773,0.7241820000000001
3000,0.27756,0.2398612868472548,0.31538050000000006,0.39781841803303636,0.8601080000000001,0.5285050000000001,0.5056215689137734,0.5889995,0.6770268038418855,0.883656,0.361756,0.3414323438563207,0.41557900000000003,0.48274476525250787,0.684496
5000,0.28029200000000004,0.25181470775376663,0.322508,0.39413277810505476,0.48398900000000006,0.6439750000000001,0.6084104062845769,0.704621,0.8091718792852965,1.077248,0.360732,0.3387828931146601,0.41482050000000004,0.48498237903048314,0.622845
7000,0.285061,0.25047515420396677,0.323239,0.40093344754021704,0.532949,0.733653,0.6934248857596604,0.802231,0.9204511001022219,1.255543,0.360808,0.3384478480958004,0.415194,0.4856945580770359,0.6306080000000001
10000,0.28941300000000003,0.2582474180281979,0.33082600000000006,0.404531346627291,0.561779,0.893462,0.8591481155467865,0.9734495000000001,1.0955883501694386,1.3769660000000001,0.363684,0.339623540666853,0.415703,0.4864489770967697,0.6700119999999999
20000,0.316033,0.28897496695874875,0.3671335,0.4374600755738735,0.637032,1.423888,1.352478624317217,1.5442500000000001,1.740090401431598,2.426091,0.363032,0.3405303143312062,0.41538800000000003,0.48456184060799,0.664407
35000,0.34459400000000007,0.3193493122923487,0.39736350000000004,0.46934273386961556,0.706652,2.218735,2.106246998341902,2.4087350000000005,2.7136653438507876,3.5900980000000002,0.365456,0.34029239587544385,0.415158,0.4838918042691672,0.7706120000000001
50000,0.370033,0.34865466677679763,0.426575,0.5018694340583403,0.6953720000000001,3.0320990000000005,2.942933326334442,3.2855260000000004,3.6137415348480757,3.8530480000000003,0.36377500000000007,0.33565653314627875,0.41349100000000005,0.485211331225776,0.644379
75000,0.41248100000000004,0.3944846848028254,0.47548,0.5560068682098708,0.713574,4.394617,4.348587073998447,4.7427775,5.152615535525365,6.484376,0.362299,0.3390086442958716,0.4145005,0.48324987085564297,0.63847
100000,0.456133,0.44344579486233043,0.5276230000000001,0.611925572744935,0.766928,5.607863,5.610209494044138,6.242407,6.855858013324285,7.413429000000001,0.36204300000000006,0.33778749998800733,0.41627250000000005,0.4891356671006007,0.809884
500000,0.27492200000000006,0.2422284242211924,0.311558,0.38640319996425837,0.6600710000000001,26.534524,22.792991627340815,27.124005000000004,31.79883963871423,42.366907000000005,0.364303,0.3402487165903319,0.41508500000000004,0.48411377608567835,0.6842830000000001

All times measured by hyperfine using hyperfine -N -w2 -L upperk 1,2,3,5,7,10,20,35,50,75,100,500 -L shuf "./gnu_shuf_9_4,./target/release_shuf_main_420dfe8a,./target/release_shuf_number_speed_g44877f96" --export-json shuf_hyperfine.json -- "{shuf} -i1-{upperk}000 -n5"

I believe the column order is:

  • upper limit of the range
  • for each implementation GNU, uutils main branch, uutils with this PR (in this order):
    • minimum time
    • avg time minus three stddev
    • median time
    • avg time plus three stddev
    • maximum time

I can tidy up and share the script I wrote which generated the above CSV from the JSON, if you'd like.

This affects one benchmarks negatively, and two strongly positively:

Benchmark -i 0-10000000 > /dev/null becomes 9-11 % slower
$ hyperfine --warmup 10 "target/release_shuf_main_420dfe8a -i 0-10000000 > /dev/null" "target/release_shuf_number_speed_g44877f96 -i 0-10000000 > /dev/null"
Benchmark 1: target/release_shuf_main_420dfe8a -i 0-10000000 > /dev/null
  Time (mean ± σ):     946.0 ms ±   6.8 ms    [User: 839.5 ms, System: 106.4 ms]
  Range (min … max):   939.5 ms … 962.7 ms    10 runs
 
Benchmark 2: target/release_shuf_number_speed_g44877f96 -i 0-10000000 > /dev/null
  Time (mean ± σ):      1.039 s ±  0.009 s    [User: 1.020 s, System: 0.019 s]
  Range (min … max):    1.028 s …  1.058 s    10 runs
 
Summary
  target/release_shuf_main_420dfe8a -i 0-10000000 > /dev/null ran
    1.10 ± 0.01 times faster than target/release_shuf_number_speed_g44877f96 -i 0-10000000 > /dev/null

I believe this is an unavoidable trade-off between memory and speed. We could add another branch to just populate a Vec<&[u8]> like in the old days if the range is small enough, but then we will bikeshed about where to draw the line. I'll leave this work to someone else.

Benchmark input.txt > /dev/null didn't change (as expected)
$ hyperfine --warmup 10 "target/release_shuf_main_420dfe8a target/input.txt > /dev/null" "target/release_shuf_number_speed_g44877f96 target/input.txt > /dev/null"
Benchmark 1: target/release_shuf_main_420dfe8a target/input.txt > /dev/null
  Time (mean ± σ):       8.4 ms ±   0.5 ms    [User: 5.9 ms, System: 2.5 ms]
  Range (min … max):     7.4 ms …  10.0 ms    364 runs
 
Benchmark 2: target/release_shuf_number_speed_g44877f96 target/input.txt > /dev/null
  Time (mean ± σ):       8.4 ms ±   0.6 ms    [User: 5.8 ms, System: 2.5 ms]
  Range (min … max):     7.3 ms …  13.5 ms    359 runs
 
Summary
  target/release_shuf_number_speed_g44877f96 target/input.txt > /dev/null ran
    1.00 ± 0.09 times faster than target/release_shuf_main_420dfe8a target/input.txt > /dev/null
Benchmark -r -n 10000000 -i 0-1000 > /dev/null becomes 438 % faster
$ hyperfine --warmup 10 "target/release_shuf_main_420dfe8a -r -n 10000000 -i 0-1000 > /dev/null" "target/release_shuf_number_speed_g44877f96 -r -n 10000000 -i 0-1000 > /dev/null"
Benchmark 1: target/release_shuf_main_420dfe8a -r -n 10000000 -i 0-1000 > /dev/null
  Time (mean ± σ):      70.5 ms ±   0.5 ms    [User: 68.8 ms, System: 1.6 ms]
  Range (min … max):    69.6 ms …  72.8 ms    42 runs
 
Benchmark 2: target/release_shuf_number_speed_g44877f96 -r -n 10000000 -i 0-1000 > /dev/null
  Time (mean ± σ):     382.6 ms ±   1.7 ms    [User: 382.1 ms, System: 0.4 ms]
  Range (min … max):   381.0 ms … 387.2 ms    10 runs
 
Summary
  target/release_shuf_main_420dfe8a -r -n 10000000 -i 0-1000 > /dev/null ran
    5.43 ± 0.05 times faster than target/release_shuf_number_speed_g44877f96 -r -n 10000000 -i 0-1000 > /dev/null
New benchmark -n 100 -i 1000-2000000000 > /dev/null becomes +Inf % faster
$ ./target/release_shuf_main_420dfe8a -n 100 -i 1000-2000000000 > /dev/null
memory allocation of 47999976024 bytes failed
Aborted
[$? = 134]
$ hyperfine --warmup 10 "./target/release_shuf_number_speed_g44877f96 -n 100 -i 1000-2000000000 > /dev/null"
Benchmark 1: ./target/release_shuf_number_speed_g44877f96 -n 100 -i 1000-2000000000 > /dev/null
  Time (mean ± σ):     416.1 µs ±  41.9 µs    [User: 390.8 µs, System: 72.2 µs]
  Range (min … max):   311.8 µs … 677.4 µs    3181 runs
 
  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

@BenWiederhake
Copy link
Collaborator Author

Fixes #1420.

I can't believe I forgot to link that issue, that was the reason I was looking into shuf in the first place :D

@BenWiederhake
Copy link
Collaborator Author

Changes since last push:

@BenWiederhake
Copy link
Collaborator Author

Changes since last push:

  • Avoid Default::default, as suggested by clippy
  • Change spelling nocrash → no_crash, numberset → number_set
  • Add some words to the spellcheck ignorelist (nonrepeating, shufable)
  • Don't test numbers beyond 2^31, because Android is 32-bit apparently?!
  • Code Coverage flaked; no action taken
  • Added support MSRV 1.70, which requires avoiding the nice -> impl Iterator feature, and instead specifying a near-Voldemort type very precisely. Oh my goodness, please delete that abomination once MSRV is raised.
  • Marked test_range_repeat as a test – dunno how I made that mistake :D
  • Rebased on new main

@BenWiederhake
Copy link
Collaborator Author

Changes since last push:

  • Change "hashset" and "maxsize" to "hash set" and "max size", because the cspell CI failed
  • Nothing else

This requires special handling, because we cannot always generate all
possible strings beforehand, e.g. in the case of "-n 2 -i 0-2147483647".
@BenWiederhake
Copy link
Collaborator Author

BenWiederhake commented Feb 24, 2024

Changes since last push:

  • Rebased on current main, to demonstrate compatibility with already-accepted shuf-PRs
  • Also, to demonstrate that we now pass the shuf.log GNU test
  • Nothing else

EDIT: Nope, this definitely doesn't pass the shuf.log GNU test yet, and I think I know why. That issue needs to be fixed in a new PR, however.

@BenWiederhake
Copy link
Collaborator Author

Changes since last push:

  • Nothing. I just want a CI rebuild, because:
  • test_with_pr_core_utils_tests flaked three times in a row. Someone should fix that.
Seemingly-relevant output of `test_with_pr_core_utils_tests`

The timestamp is pretty much the exact time at which the CI action ran, so it seems absolutely reasonable that by pure chance the right and left sides ran at sliiightly different fractions of a second, and straddled the tick of a second. Given how busy the CI machines are, that seems reasonable to me.

(many empty lines removed)

--- TRY 3 STDERR:        coreutils::tests test_pr::test_with_pr_core_utils_tests ---
thread 'main' panicked at tests/by-util/test_pr.rs:429:39:
assertion failed: `(left == right)`
Diff < left / right > :
 
<Feb 24 17:41 2024 0Ft Page 3
>Feb 24 17:42 2024 0Ft Page 3
 
<Feb 24 17:41 2024 0Ft Page 4
>Feb 24 17:42 2024 0Ft Page 4
 15 xyzxyzxyz XYZXYZXYZ abcabcab
 16 456789 123456789 xyzxyzxyz XYZXYZXYZ
 7                                       12345678
 8                                       12345678
 9 3456789 ab
 20 DEFGHI 123
 1                                       12345678
 2                                       12345678
 3                                       12345678
 4                                       12345678
 5                                       12345678
 6                                       12345678
 27 no truncation before FF; (r_l-test):
 28 no trunc
 
<Feb 24 17:41 2024 0Ft Page 5
>Feb 24 17:42 2024 0Ft Page 5
 29 xyzxyzxyz XYZXYZXYZ abcabcab
 30 456789 123456789 xyzxyzxyz XYZXYZXYZ
 1                                       12345678
 2 3456789 abcdefghi
 3                                       12345678

@sylvestre
Copy link
Contributor

excellent, thanks

@sylvestre sylvestre merged commit 8301a8e into uutils:main Feb 25, 2024
62 checks passed
@BenWiederhake BenWiederhake deleted the dev-shuf-number-speed branch February 25, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shuf: panic due to capacity overflow when allocating a vector
2 participants