std.compress: API cleanup for decompression across different formats #14739

dweiller · 2023-02-27T06:06:42Z

Currently the different formats in std.compress have inconsistent APIs, which complicates writing code that is generic over different formats - the package manager makes use of identical APIs to make the unpackTarball() function generic over the format.

This issue is looking to gather ideas about how a common decompression API can be exposed for the different formats or decide that it's not worth it or not possible in a reasonable way. The status quo is:

gzip, lzma and xz expose Decompress(ReaderType: anytype) type generic over an underlying reader which has public methods deinit(*@This()) void, read(*This(), []u8) Error!usize, and reader(*@This()) Reader, where Reader = std.io.Reader(*@This(), Error, read). To create a Decompress(ReaderType) there is a function decompress(std.mem.Allocator, source: anytype) !Decompress(@TypeOf(source)).
zlib provides essentially the same interface, except zlib calls these functions ZlibStream/zlibStream.
lzma has an additional decompressWithOptions(std.mem.Allocator, source: anytype, std.compress.lzma.decode.Options) !Decompress(@TypeOf(source)) creation function.
zstandard has a similar interface, the analogous entry points are called DecompressStream/decompressStream and adds a comptime options argument to DecompressStream (though it's not necessary that they are comptime and could be made runtime parameters to decompressStreamOptions). The decompressStream and decompressStreamOptions functions are infallible and do not return an error union, unlike the analogues for the other formats.
lzma2 provides decompress(std.mem.Allocator, reader: anytype, writer: anytype) !void which decompresses directly into a writer rather than providing a std.io.Reader interface.
lzma, lzma2 provide access to lower-level decode namespaces
zstandard exposes a decompress namespace which contains functions for decoding from a slice with different destinations (slice, std.ArrayList, or std.RingBuffer) as well as lower-level details that can be used to build other Zstandard decompressors
deflate provides Decompresser()/decompressor() which are equivalent to Decompress()/decompress() of gzip and xz except that deflate.decompressor takes an additional dictionary parameter.

I think that the names Decompress()/decompress() are not particularly clear on what they do, especially if non-streaming APIs are available (hence the choice to use DecompressStream/decompressStream for zstandard). I think including 'stream' in the name makes sense to make it clear that these are streaming APIs.

I'm not familiar with the details of the formats other than Zstandard, but does it make sense to move the initial fallible operations performed by initialisation into the read function for the others so all decompress() analogues can be non-fallible? I don't believe there is any fallible initialisation that makes sense for Zstandard and it seems silly to make the return type error{}!DecompressStream(...) just to match the others.

What level of detail is desired? zstandard exposes significantly more lower-level details than the other - enough that I think all the higher level entry points are written purely in terms of lower level functions/types that are exposed publicly. I like exposing these building blocks so that people who have a use-case where they want a more control over decompression can do it while reusing the code that std needs anyway, but maybe it's not worth the API complexity?

The text was updated successfully, but these errors were encountered:

andrewrk · 2023-10-17T04:47:34Z

I think you've accurately identified the problem to be solved. Do you have a proposed solution? I think if you went through the trouble to open a PR to resolve this, it would likely be a satisfactory solution.

dweiller mentioned this issue Feb 27, 2023

support fetching dependencies bundled with .tar.zst extension #14299

Closed

andrewrk added standard library This issue involves writing Zig code for the standard library. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Apr 10, 2023

andrewrk added this to the 0.13.0 milestone Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std.compress: API cleanup for decompression across different formats #14739

std.compress: API cleanup for decompression across different formats #14739

dweiller commented Feb 27, 2023

andrewrk commented Oct 17, 2023

std.compress: API cleanup for decompression across different formats #14739

std.compress: API cleanup for decompression across different formats #14739

Comments

dweiller commented Feb 27, 2023

andrewrk commented Oct 17, 2023