Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huff0: Use static decompression buffer up to 30% faster #499

Merged
merged 2 commits into from
Feb 20, 2022

Conversation

klauspost
Copy link
Owner

Skip zeroing stack and add a bigger reusable buffer:

benchmark                                            old ns/op     new ns/op     delta
BenchmarkDecompress4XNoTable/digits-32               200409        151974        -24.17%
BenchmarkDecompress4XNoTable/gettysburg-32           2610          2565          -1.72%
BenchmarkDecompress4XNoTable/twain-32                558870        532480        -4.72%
BenchmarkDecompress4XNoTable/low-ent.10k-32          57291         53948         -5.84%
BenchmarkDecompress4XNoTable/superlow-ent-10k-32     15556         14443         -7.15%
BenchmarkDecompress4XNoTable/case1-32                296           257           -13.28%
BenchmarkDecompress4XNoTable/case2-32                250           206           -17.27%
BenchmarkDecompress4XNoTable/case3-32                257           217           -15.46%
BenchmarkDecompress4XNoTable/pngdata.001-32          75101         73473         -2.17%
BenchmarkDecompress4XNoTable/normcount2-32           414           319           -22.89%
BenchmarkDecompress4XNoTableTableLog8/digits-32      200054        153302        -23.37%
BenchmarkDecompress4XTable/digits-32                 200226        152865        -23.65%
BenchmarkDecompress4XTable/gettysburg-32             3815          3908          +2.44%
BenchmarkDecompress4XTable/twain-32                  557677        533928        -4.26%
BenchmarkDecompress4XTable/low-ent.10k-32            57732         54737         -5.19%
BenchmarkDecompress4XTable/superlow-ent-10k-32       16101         14918         -7.35%
BenchmarkDecompress4XTable/case1-32                  2035          2003          -1.57%
BenchmarkDecompress4XTable/case2-32                  2023          1962          -3.02%
BenchmarkDecompress4XTable/case3-32                  2036          1986          -2.46%
BenchmarkDecompress4XTable/pngdata.001-32            78191         76109         -2.66%
BenchmarkDecompress4XTable/normcount2-32             1454          1371          -5.71%

benchmark                                            old MB/s     new MB/s     speedup
BenchmarkDecompress4XNoTable/digits-32               498.99       658.03       1.32x
BenchmarkDecompress4XNoTable/gettysburg-32           593.10       603.62       1.02x
BenchmarkDecompress4XNoTable/twain-32                469.06       492.31       1.05x
BenchmarkDecompress4XNoTable/low-ent.10k-32          698.18       741.46       1.06x
BenchmarkDecompress4XNoTable/superlow-ent-10k-32     674.99       726.98       1.08x
BenchmarkDecompress4XNoTable/case1-32                185.82       214.27       1.15x
BenchmarkDecompress4XNoTable/case2-32                180.33       218.02       1.21x
BenchmarkDecompress4XNoTable/case3-32                186.94       221.15       1.18x
BenchmarkDecompress4XNoTable/pngdata.001-32          681.75       696.85       1.02x
BenchmarkDecompress4XNoTable/normcount2-32           210.23       272.66       1.30x
BenchmarkDecompress4XNoTableTableLog8/digits-32      499.88       652.33       1.30x
BenchmarkDecompress4XTable/digits-32                 499.45       654.19       1.31x
BenchmarkDecompress4XTable/gettysburg-32             405.81       396.10       0.98x
BenchmarkDecompress4XTable/twain-32                  470.06       490.97       1.04x
BenchmarkDecompress4XTable/low-ent.10k-32            692.86       730.77       1.05x
BenchmarkDecompress4XTable/superlow-ent-10k-32       652.15       703.85       1.08x
BenchmarkDecompress4XTable/case1-32                  27.02        27.46        1.02x
BenchmarkDecompress4XTable/case2-32                  22.24        22.94        1.03x
BenchmarkDecompress4XTable/case3-32                  23.58        24.17        1.03x
BenchmarkDecompress4XTable/pngdata.001-32            654.81       672.72       1.03x
BenchmarkDecompress4XTable/normcount2-32             59.83        63.47        1.06x

Skip zeroing stack and add a bigger reusable buffer:

```
benchmark                                            old ns/op     new ns/op     delta
BenchmarkDecompress4XNoTable/digits-32               200409        151974        -24.17%
BenchmarkDecompress4XNoTable/gettysburg-32           2610          2565          -1.72%
BenchmarkDecompress4XNoTable/twain-32                558870        532480        -4.72%
BenchmarkDecompress4XNoTable/low-ent.10k-32          57291         53948         -5.84%
BenchmarkDecompress4XNoTable/superlow-ent-10k-32     15556         14443         -7.15%
BenchmarkDecompress4XNoTable/case1-32                296           257           -13.28%
BenchmarkDecompress4XNoTable/case2-32                250           206           -17.27%
BenchmarkDecompress4XNoTable/case3-32                257           217           -15.46%
BenchmarkDecompress4XNoTable/pngdata.001-32          75101         73473         -2.17%
BenchmarkDecompress4XNoTable/normcount2-32           414           319           -22.89%
BenchmarkDecompress4XNoTableTableLog8/digits-32      200054        153302        -23.37%
BenchmarkDecompress4XTable/digits-32                 200226        152865        -23.65%
BenchmarkDecompress4XTable/gettysburg-32             3815          3908          +2.44%
BenchmarkDecompress4XTable/twain-32                  557677        533928        -4.26%
BenchmarkDecompress4XTable/low-ent.10k-32            57732         54737         -5.19%
BenchmarkDecompress4XTable/superlow-ent-10k-32       16101         14918         -7.35%
BenchmarkDecompress4XTable/case1-32                  2035          2003          -1.57%
BenchmarkDecompress4XTable/case2-32                  2023          1962          -3.02%
BenchmarkDecompress4XTable/case3-32                  2036          1986          -2.46%
BenchmarkDecompress4XTable/pngdata.001-32            78191         76109         -2.66%
BenchmarkDecompress4XTable/normcount2-32             1454          1371          -5.71%

benchmark                                            old MB/s     new MB/s     speedup
BenchmarkDecompress4XNoTable/digits-32               498.99       658.03       1.32x
BenchmarkDecompress4XNoTable/gettysburg-32           593.10       603.62       1.02x
BenchmarkDecompress4XNoTable/twain-32                469.06       492.31       1.05x
BenchmarkDecompress4XNoTable/low-ent.10k-32          698.18       741.46       1.06x
BenchmarkDecompress4XNoTable/superlow-ent-10k-32     674.99       726.98       1.08x
BenchmarkDecompress4XNoTable/case1-32                185.82       214.27       1.15x
BenchmarkDecompress4XNoTable/case2-32                180.33       218.02       1.21x
BenchmarkDecompress4XNoTable/case3-32                186.94       221.15       1.18x
BenchmarkDecompress4XNoTable/pngdata.001-32          681.75       696.85       1.02x
BenchmarkDecompress4XNoTable/normcount2-32           210.23       272.66       1.30x
BenchmarkDecompress4XNoTableTableLog8/digits-32      499.88       652.33       1.30x
BenchmarkDecompress4XTable/digits-32                 499.45       654.19       1.31x
BenchmarkDecompress4XTable/gettysburg-32             405.81       396.10       0.98x
BenchmarkDecompress4XTable/twain-32                  470.06       490.97       1.04x
BenchmarkDecompress4XTable/low-ent.10k-32            692.86       730.77       1.05x
BenchmarkDecompress4XTable/superlow-ent-10k-32       652.15       703.85       1.08x
BenchmarkDecompress4XTable/case1-32                  27.02        27.46        1.02x
BenchmarkDecompress4XTable/case2-32                  22.24        22.94        1.03x
BenchmarkDecompress4XTable/case3-32                  23.58        24.17        1.03x
BenchmarkDecompress4XTable/pngdata.001-32            654.81       672.72       1.03x
BenchmarkDecompress4XTable/normcount2-32             59.83        63.47        1.06x
```
@klauspost klauspost changed the title huff0: Use static decompression buffer up 25% faster huff0: Use static decompression buffer up to 25% faster Feb 19, 2022
@klauspost klauspost changed the title huff0: Use static decompression buffer up to 25% faster huff0: Use static decompression buffer up to 30% faster Feb 20, 2022
@klauspost klauspost merged commit 8949d94 into master Feb 20, 2022
@klauspost klauspost deleted the huff0-static-buffer branch February 21, 2022 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant