Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DecompressInto for decompression of payloads with known sizes #130

Merged
merged 1 commit into from
Jun 22, 2023

Conversation

jbowens
Copy link
Contributor

@jbowens jbowens commented Jun 16, 2023

In some systems, compressed payloads are prefixed or tagged with the size of the original payload. This allows allocation of a perfectly-sized buffer to hold the decompressed payload. The current Decompress interface prohibits this type of usage, because Decompress allocates if the destination buffer is smaller than the worst-case bound. In practice, a perfectly-sized buffer will always be smaller than what's required for the worst-case payload.

This commit introduces an additional DecompressInto entrypoint. DecompressInto accepts a destination buffer and requires that the decompressed payload fit into the buffer. If it does not, it returns the original error returned by zstd.

@jbowens
Copy link
Contributor Author

jbowens commented Jun 16, 2023

@Viq111 — Please let me know your thoughts. Happy to rework the interface as needed.

@jbowens
Copy link
Contributor Author

jbowens commented Jun 21, 2023

@Viq111 — Is this something that you might be willing to merge upstream, or should I fork?

Copy link
Collaborator

@Viq111 Viq111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your contribution!
Sorry for the delay, the end of quarter is always a bit busier.
After the comments are resolved, the PR lgtm!

@@ -161,3 +161,18 @@ func Decompress(dst, src []byte) ([]byte, error) {
defer r.Close()
return ioutil.ReadAll(r)
}

// DecompressInto decompresses src into dst. Unlike Decompress, DecompressInto
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you have Decompress calls DecompressInto at l.146 so we don't have to repeat the call to C.ZSTD_decompress ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

zstd.go Outdated
// payload before attempting decompression.
//
// If dst is too small, DecompressInto returns an error.
func DecompressInto(dst, src []byte) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it a bit closer to golang stdlib, I'd probably have the method signature also returns the nb of bytes written (DecompressInto(dst, src []byte) (int, error) like https://pkg.go.dev/io#Copy). You can always ignore it for your use-case but I think it will usage a bit more wide at no extra cost

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

In some systems, compressed payloads are prefixed or tagged with the size of
the original payload. This allows allocation of a perfectly-sized buffer to
hold the decompressed payload. The current Decompress interface prohibits this
type of usage, because Decompress allocates if the destination buffer is
smaller than the worst-case bound. In practice, a perfectly-sized buffer will
always be smaller than what's required for the worst-case payload.

This commit introduces an additional DecompressInto entrypoint. DecompressInto
accepts a destination buffer and requires that the decompressed payload fit
into the buffer. If it does not, it returns the original error returned by zstd.
@jbowens
Copy link
Contributor Author

jbowens commented Jun 22, 2023

Sorry for the delay, the end of quarter is always a bit busier.

No worries! Thanks for reviewing :)

Copy link
Collaborator

@Viq111 Viq111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change on benchmarks:

goos: darwin
goarch: arm64
pkg: github.com/DataDog/zstd
                       │     old      │                 new                 │
                       │    sec/op    │    sec/op      vs base              │
Compression-10           66.37m ± ∞ ¹    67.24m ± ∞ ¹       ~ (p=0.421 n=5)
Decompression-10         9.424m ± ∞ ¹    9.336m ± ∞ ¹       ~ (p=0.222 n=5)
geomean                  1.901m          1.916m        +0.84%
¹ need >= 6 samples for confidence interval at level 0.95

                       │      old       │                 new                  │
                       │      B/s       │      B/s        vs base              │
Compression-10            143.3Mi ± ∞ ¹    141.4Mi ± ∞ ¹       ~ (p=0.421 n=5)
Decompression-10         1008.9Mi ± ∞ ¹   1018.5Mi ± ∞ ¹       ~ (p=0.222 n=5)
geomean                   259.7Mi          257.6Mi        -0.83%
¹ need >= 6 samples for confidence interval at level 0.95

@Viq111 Viq111 merged commit ea68dca into DataDog:1.x Jun 22, 2023
@jbowens jbowens deleted the decompress-into branch June 22, 2023 17:37
@ribasushi
Copy link

@Viq111 @jbowens it seems that the new method has not been added to the zstd.NewCtx() object. Is this intentional or an oversight?

@jbowens
Copy link
Contributor Author

jbowens commented Apr 22, 2024

@ribasushi — It wasn't intentional on my side; my project does not use the zstd context API and I only had passing awareness of its existence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants