shuf: Fix OOM crash for huge number ranges #5980

BenWiederhake · 2024-02-17T01:04:10Z

gnu_shuf -n10 -r -i0-4294967295 will run perfectly fine on virtually all platforms, and allocates only 2 MiB.
uushuf -n10 -r -i0-4294967295 immediately OOMs with memory allocation of 103079215104 bytes failed, and probably needs at least 137 GiB of RAM and several hours if not weeks to actually work.

Differences with GNU are treated as bugs.

This PR fixes this crash bug, by treating number ranges specially:

The first commit extracts a common interface between Vec<&[u8]> and the "number range" concepts.
The second commit implements and tests a special implementation for the number range case.
The third commit adds some documentation.

The "special implementation" is a really stupid approach. I'm sure this can be optimized further (e.g. bitsets?), but first let's get this working at all. Also note that this will fix a failure in one of the GNU shuf tests.

Furthermore, observe how much faster it becomes even for "small" numbers, here running {shuf} -n5 -i1-{upper_limit_like_50k}:

Raw `data.csv` of this graph, also zoomed out

1000,0.272365,0.24319744253057665,0.3176925,0.3875586935583776,0.500758,0.41346400000000005,0.3957279915053639,0.46854100000000004,0.5375989723907768,0.7125100000000001,0.363566,0.3392808250610892,0.41423400000000005,0.48483559284437167,0.6130350000000001
2000,0.274465,0.24164537174005973,0.315623,0.39188568086606146,0.6250110000000001,0.4675770000000001,0.44664010661395964,0.5237430000000001,0.60115189977922,0.8442350000000001,0.361715,0.3403948256359187,0.41513100000000003,0.48389609310371773,0.7241820000000001
3000,0.27756,0.2398612868472548,0.31538050000000006,0.39781841803303636,0.8601080000000001,0.5285050000000001,0.5056215689137734,0.5889995,0.6770268038418855,0.883656,0.361756,0.3414323438563207,0.41557900000000003,0.48274476525250787,0.684496
5000,0.28029200000000004,0.25181470775376663,0.322508,0.39413277810505476,0.48398900000000006,0.6439750000000001,0.6084104062845769,0.704621,0.8091718792852965,1.077248,0.360732,0.3387828931146601,0.41482050000000004,0.48498237903048314,0.622845
7000,0.285061,0.25047515420396677,0.323239,0.40093344754021704,0.532949,0.733653,0.6934248857596604,0.802231,0.9204511001022219,1.255543,0.360808,0.3384478480958004,0.415194,0.4856945580770359,0.6306080000000001
10000,0.28941300000000003,0.2582474180281979,0.33082600000000006,0.404531346627291,0.561779,0.893462,0.8591481155467865,0.9734495000000001,1.0955883501694386,1.3769660000000001,0.363684,0.339623540666853,0.415703,0.4864489770967697,0.6700119999999999
20000,0.316033,0.28897496695874875,0.3671335,0.4374600755738735,0.637032,1.423888,1.352478624317217,1.5442500000000001,1.740090401431598,2.426091,0.363032,0.3405303143312062,0.41538800000000003,0.48456184060799,0.664407
35000,0.34459400000000007,0.3193493122923487,0.39736350000000004,0.46934273386961556,0.706652,2.218735,2.106246998341902,2.4087350000000005,2.7136653438507876,3.5900980000000002,0.365456,0.34029239587544385,0.415158,0.4838918042691672,0.7706120000000001
50000,0.370033,0.34865466677679763,0.426575,0.5018694340583403,0.6953720000000001,3.0320990000000005,2.942933326334442,3.2855260000000004,3.6137415348480757,3.8530480000000003,0.36377500000000007,0.33565653314627875,0.41349100000000005,0.485211331225776,0.644379
75000,0.41248100000000004,0.3944846848028254,0.47548,0.5560068682098708,0.713574,4.394617,4.348587073998447,4.7427775,5.152615535525365,6.484376,0.362299,0.3390086442958716,0.4145005,0.48324987085564297,0.63847
100000,0.456133,0.44344579486233043,0.5276230000000001,0.611925572744935,0.766928,5.607863,5.610209494044138,6.242407,6.855858013324285,7.413429000000001,0.36204300000000006,0.33778749998800733,0.41627250000000005,0.4891356671006007,0.809884
500000,0.27492200000000006,0.2422284242211924,0.311558,0.38640319996425837,0.6600710000000001,26.534524,22.792991627340815,27.124005000000004,31.79883963871423,42.366907000000005,0.364303,0.3402487165903319,0.41508500000000004,0.48411377608567835,0.6842830000000001

All times measured by hyperfine using hyperfine -N -w2 -L upperk 1,2,3,5,7,10,20,35,50,75,100,500 -L shuf "./gnu_shuf_9_4,./target/release_shuf_main_420dfe8a,./target/release_shuf_number_speed_g44877f96" --export-json shuf_hyperfine.json -- "{shuf} -i1-{upperk}000 -n5"

I believe the column order is:

upper limit of the range
for each implementation GNU, uutils main branch, uutils with this PR (in this order):
- minimum time
- avg time minus three stddev
- median time
- avg time plus three stddev
- maximum time

I can tidy up and share the script I wrote which generated the above CSV from the JSON, if you'd like.

This affects one benchmarks negatively, and two strongly positively:

Benchmark -i 0-10000000 > /dev/null becomes 9-11 % slower

$ hyperfine --warmup 10 "target/release_shuf_main_420dfe8a -i 0-10000000 > /dev/null" "target/release_shuf_number_speed_g44877f96 -i 0-10000000 > /dev/null"
Benchmark 1: target/release_shuf_main_420dfe8a -i 0-10000000 > /dev/null
  Time (mean ± σ):     946.0 ms ±   6.8 ms    [User: 839.5 ms, System: 106.4 ms]
  Range (min … max):   939.5 ms … 962.7 ms    10 runs
 
Benchmark 2: target/release_shuf_number_speed_g44877f96 -i 0-10000000 > /dev/null
  Time (mean ± σ):      1.039 s ±  0.009 s    [User: 1.020 s, System: 0.019 s]
  Range (min … max):    1.028 s …  1.058 s    10 runs
 
Summary
  target/release_shuf_main_420dfe8a -i 0-10000000 > /dev/null ran
    1.10 ± 0.01 times faster than target/release_shuf_number_speed_g44877f96 -i 0-10000000 > /dev/null

I believe this is an unavoidable trade-off between memory and speed. We could add another branch to just populate a Vec<&[u8]> like in the old days if the range is small enough, but then we will bikeshed about where to draw the line. I'll leave this work to someone else.

Benchmark input.txt > /dev/null didn't change (as expected)

$ hyperfine --warmup 10 "target/release_shuf_main_420dfe8a target/input.txt > /dev/null" "target/release_shuf_number_speed_g44877f96 target/input.txt > /dev/null"
Benchmark 1: target/release_shuf_main_420dfe8a target/input.txt > /dev/null
  Time (mean ± σ):       8.4 ms ±   0.5 ms    [User: 5.9 ms, System: 2.5 ms]
  Range (min … max):     7.4 ms …  10.0 ms    364 runs
 
Benchmark 2: target/release_shuf_number_speed_g44877f96 target/input.txt > /dev/null
  Time (mean ± σ):       8.4 ms ±   0.6 ms    [User: 5.8 ms, System: 2.5 ms]
  Range (min … max):     7.3 ms …  13.5 ms    359 runs
 
Summary
  target/release_shuf_number_speed_g44877f96 target/input.txt > /dev/null ran
    1.00 ± 0.09 times faster than target/release_shuf_main_420dfe8a target/input.txt > /dev/null

Benchmark -r -n 10000000 -i 0-1000 > /dev/null becomes 438 % faster

$ hyperfine --warmup 10 "target/release_shuf_main_420dfe8a -r -n 10000000 -i 0-1000 > /dev/null" "target/release_shuf_number_speed_g44877f96 -r -n 10000000 -i 0-1000 > /dev/null"
Benchmark 1: target/release_shuf_main_420dfe8a -r -n 10000000 -i 0-1000 > /dev/null
  Time (mean ± σ):      70.5 ms ±   0.5 ms    [User: 68.8 ms, System: 1.6 ms]
  Range (min … max):    69.6 ms …  72.8 ms    42 runs
 
Benchmark 2: target/release_shuf_number_speed_g44877f96 -r -n 10000000 -i 0-1000 > /dev/null
  Time (mean ± σ):     382.6 ms ±   1.7 ms    [User: 382.1 ms, System: 0.4 ms]
  Range (min … max):   381.0 ms … 387.2 ms    10 runs
 
Summary
  target/release_shuf_main_420dfe8a -r -n 10000000 -i 0-1000 > /dev/null ran
    5.43 ± 0.05 times faster than target/release_shuf_number_speed_g44877f96 -r -n 10000000 -i 0-1000 > /dev/null

New benchmark -n 100 -i 1000-2000000000 > /dev/null becomes +Inf % faster

$ ./target/release_shuf_main_420dfe8a -n 100 -i 1000-2000000000 > /dev/null
memory allocation of 47999976024 bytes failed
Aborted
[$? = 134]
$ hyperfine --warmup 10 "./target/release_shuf_number_speed_g44877f96 -n 100 -i 1000-2000000000 > /dev/null"
Benchmark 1: ./target/release_shuf_number_speed_g44877f96 -n 100 -i 1000-2000000000 > /dev/null
  Time (mean ± σ):     416.1 µs ±  41.9 µs    [User: 390.8 µs, System: 72.2 µs]
  Range (min … max):   311.8 µs … 677.4 µs    3181 runs
 
  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

BenWiederhake · 2024-02-17T01:06:58Z

Fixes #1420.

I can't believe I forgot to link that issue, that was the reason I was looking into shuf in the first place :D

BenWiederhake · 2024-02-17T15:46:58Z

Changes since last push:

Rebased, to clear the conflict with shuf: include all echo args, not just the last #5978
Nothing else

BenWiederhake · 2024-02-22T00:03:28Z

Changes since last push:

Avoid Default::default, as suggested by clippy
Change spelling nocrash → no_crash, numberset → number_set
Add some words to the spellcheck ignorelist (nonrepeating, shufable)
Don't test numbers beyond 2^31, because Android is 32-bit apparently?!
Code Coverage flaked; no action taken
Added support MSRV 1.70, which requires avoiding the nice -> impl Iterator feature, and instead specifying a near-Voldemort type very precisely. Oh my goodness, please delete that abomination once MSRV is raised.
Marked test_range_repeat as a test – dunno how I made that mistake :D
Rebased on new main

BenWiederhake · 2024-02-22T00:17:00Z

Changes since last push:

Change "hashset" and "maxsize" to "hash set" and "max size", because the cspell CI failed
Nothing else

This requires special handling, because we cannot always generate all possible strings beforehand, e.g. in the case of "-n 2 -i 0-2147483647".

BenWiederhake · 2024-02-24T17:35:32Z

Changes since last push:

Rebased on current main, to demonstrate compatibility with already-accepted shuf-PRs
~~Also, to demonstrate that we now pass the shuf.log GNU test~~
Nothing else

EDIT: Nope, this definitely doesn't pass the shuf.log GNU test yet, and I think I know why. That issue needs to be fixed in a new PR, however.

BenWiederhake · 2024-02-24T20:39:49Z

Changes since last push:

Nothing. I just want a CI rebuild, because:
test_with_pr_core_utils_tests flaked three times in a row. Someone should fix that.

Seemingly-relevant output of `test_with_pr_core_utils_tests`

The timestamp is pretty much the exact time at which the CI action ran, so it seems absolutely reasonable that by pure chance the right and left sides ran at sliiightly different fractions of a second, and straddled the tick of a second. Given how busy the CI machines are, that seems reasonable to me.

(many empty lines removed)

--- TRY 3 STDERR:        coreutils::tests test_pr::test_with_pr_core_utils_tests ---
thread 'main' panicked at tests/by-util/test_pr.rs:429:39:
assertion failed: `(left == right)`
Diff < left / right > :
 
<Feb 24 17:41 2024 0Ft Page 3
>Feb 24 17:42 2024 0Ft Page 3
 
<Feb 24 17:41 2024 0Ft Page 4
>Feb 24 17:42 2024 0Ft Page 4
 15 xyzxyzxyz XYZXYZXYZ abcabcab
 16 456789 123456789 xyzxyzxyz XYZXYZXYZ
 7                                       12345678
 8                                       12345678
 9 3456789 ab
 20 DEFGHI 123
 1                                       12345678
 2                                       12345678
 3                                       12345678
 4                                       12345678
 5                                       12345678
 6                                       12345678
 27 no truncation before FF; (r_l-test):
 28 no trunc
 
<Feb 24 17:41 2024 0Ft Page 5
>Feb 24 17:42 2024 0Ft Page 5
 29 xyzxyzxyz XYZXYZXYZ abcabcab
 30 456789 123456789 xyzxyzxyz XYZXYZXYZ
 1                                       12345678
 2 3456789 abcdefghi
 3                                       12345678

sylvestre · 2024-02-25T08:50:36Z

excellent, thanks

cakebaker linked an issue Feb 17, 2024 that may be closed by this pull request

shuf: panic due to capacity overflow when allocating a vector #1420

Closed

BenWiederhake force-pushed the dev-shuf-number-speed branch from c0069a5 to c11ec11 Compare February 17, 2024 15:45

sylvestre force-pushed the dev-shuf-number-speed branch from c11ec11 to 7707771 Compare February 19, 2024 08:04

BenWiederhake mentioned this pull request Feb 20, 2024

shuf: Do not read input when -n0 is given #5990

Merged

BenWiederhake force-pushed the dev-shuf-number-speed branch from 7707771 to db893a0 Compare February 22, 2024 00:03

BenWiederhake force-pushed the dev-shuf-number-speed branch from db893a0 to d04c2d2 Compare February 22, 2024 00:15

BenWiederhake added 2 commits February 24, 2024 18:33

shuf: extract minimal required interface between data and core loop

989178f

shuf: handle --input-range with huge number ranges

352a8a5

This requires special handling, because we cannot always generate all possible strings beforehand, e.g. in the case of "-n 2 -i 0-2147483647".

BenWiederhake force-pushed the dev-shuf-number-speed branch from d04c2d2 to f79dc64 Compare February 24, 2024 17:34

shuf: document new benchmark category

f25b210

BenWiederhake force-pushed the dev-shuf-number-speed branch from f79dc64 to f25b210 Compare February 24, 2024 20:36

BenWiederhake mentioned this pull request Feb 24, 2024

pr: Check the correct timestamp in test_with_pr_core_utils_tests #6010

Merged

sylvestre merged commit 8301a8e into uutils:main Feb 25, 2024
62 checks passed

BenWiederhake deleted the dev-shuf-number-speed branch February 25, 2024 10:45

This was referenced Feb 25, 2024

shuf: Refuse repeating zero lines #6011

Merged

shuf: Fix off-by-one errors in range handling #6014

Merged

chenrui333 mentioned this pull request Mar 24, 2024

uutils-coreutils 0.0.25 Homebrew/homebrew-core#167053

Merged

moonfruit mentioned this pull request Mar 28, 2024

uutils-selected 0.0.25 moonfruit/homebrew-tap#92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shuf: Fix OOM crash for huge number ranges #5980

shuf: Fix OOM crash for huge number ranges #5980

BenWiederhake commented Feb 17, 2024

BenWiederhake commented Feb 17, 2024

BenWiederhake commented Feb 17, 2024

BenWiederhake commented Feb 22, 2024

BenWiederhake commented Feb 22, 2024

BenWiederhake commented Feb 24, 2024 •

edited

Loading

BenWiederhake commented Feb 24, 2024

sylvestre commented Feb 25, 2024

shuf: Fix OOM crash for huge number ranges #5980

shuf: Fix OOM crash for huge number ranges #5980

Conversation

BenWiederhake commented Feb 17, 2024

BenWiederhake commented Feb 17, 2024

BenWiederhake commented Feb 17, 2024

BenWiederhake commented Feb 22, 2024

BenWiederhake commented Feb 22, 2024

BenWiederhake commented Feb 24, 2024 • edited Loading

BenWiederhake commented Feb 24, 2024

sylvestre commented Feb 25, 2024

BenWiederhake commented Feb 24, 2024 •

edited

Loading