Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.compress: API cleanup for decompression across different formats #14739

Open
dweiller opened this issue Feb 27, 2023 · 1 comment
Open

std.compress: API cleanup for decompression across different formats #14739

dweiller opened this issue Feb 27, 2023 · 1 comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@dweiller
Copy link
Contributor

Currently the different formats in std.compress have inconsistent APIs, which complicates writing code that is generic over different formats - the package manager makes use of identical APIs to make the unpackTarball() function generic over the format.

This issue is looking to gather ideas about how a common decompression API can be exposed for the different formats or decide that it's not worth it or not possible in a reasonable way. The status quo is:

  • gzip, lzma and xz expose Decompress(ReaderType: anytype) type generic over an underlying reader which has public methods deinit(*@This()) void, read(*This(), []u8) Error!usize, and reader(*@This()) Reader, where Reader = std.io.Reader(*@This(), Error, read). To create a Decompress(ReaderType) there is a function decompress(std.mem.Allocator, source: anytype) !Decompress(@TypeOf(source)).
  • zlib provides essentially the same interface, except zlib calls these functions ZlibStream/zlibStream.
  • lzma has an additional decompressWithOptions(std.mem.Allocator, source: anytype, std.compress.lzma.decode.Options) !Decompress(@TypeOf(source)) creation function.
  • zstandard has a similar interface, the analogous entry points are called DecompressStream/decompressStream and adds a comptime options argument to DecompressStream (though it's not necessary that they are comptime and could be made runtime parameters to decompressStreamOptions). The decompressStream and decompressStreamOptions functions are infallible and do not return an error union, unlike the analogues for the other formats.
  • lzma2 provides decompress(std.mem.Allocator, reader: anytype, writer: anytype) !void which decompresses directly into a writer rather than providing a std.io.Reader interface.
  • lzma, lzma2 provide access to lower-level decode namespaces
  • zstandard exposes a decompress namespace which contains functions for decoding from a slice with different destinations (slice, std.ArrayList, or std.RingBuffer) as well as lower-level details that can be used to build other Zstandard decompressors
  • deflate provides Decompresser()/decompressor() which are equivalent to Decompress()/decompress() of gzip and xz except that deflate.decompressor takes an additional dictionary parameter.

I think that the names Decompress()/decompress() are not particularly clear on what they do, especially if non-streaming APIs are available (hence the choice to use DecompressStream/decompressStream for zstandard). I think including 'stream' in the name makes sense to make it clear that these are streaming APIs.

I'm not familiar with the details of the formats other than Zstandard, but does it make sense to move the initial fallible operations performed by initialisation into the read function for the others so all decompress() analogues can be non-fallible? I don't believe there is any fallible initialisation that makes sense for Zstandard and it seems silly to make the return type error{}!DecompressStream(...) just to match the others.

What level of detail is desired? zstandard exposes significantly more lower-level details than the other - enough that I think all the higher level entry points are written purely in terms of lower level functions/types that are exposed publicly. I like exposing these building blocks so that people who have a use-case where they want a more control over decompression can do it while reusing the code that std needs anyway, but maybe it's not worth the API complexity?

@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Apr 10, 2023
@andrewrk andrewrk added this to the 0.13.0 milestone Apr 10, 2023
@andrewrk
Copy link
Member

I think you've accurately identified the problem to be solved. Do you have a proposed solution? I think if you went through the trouble to open a PR to resolve this, it would likely be a satisfactory solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

2 participants