std.compress: API cleanup for decompression across different formats #14739
Labels
proposal
This issue suggests modifications. If it also has the "accepted" label then it is planned.
standard library
This issue involves writing Zig code for the standard library.
Milestone
Currently the different formats in
std.compress
have inconsistent APIs, which complicates writing code that is generic over different formats - the package manager makes use of identical APIs to make theunpackTarball() function
generic over the format.This issue is looking to gather ideas about how a common decompression API can be exposed for the different formats or decide that it's not worth it or not possible in a reasonable way. The status quo is:
gzip
,lzma
andxz
exposeDecompress(ReaderType: anytype) type
generic over an underlying reader which has public methodsdeinit(*@This()) void
,read(*This(), []u8) Error!usize
, andreader(*@This()) Reader
, whereReader = std.io.Reader(*@This(), Error, read)
. To create aDecompress(ReaderType)
there is a functiondecompress(std.mem.Allocator, source: anytype) !Decompress(@TypeOf(source))
.zlib
provides essentially the same interface, exceptzlib
calls these functionsZlibStream
/zlibStream
.lzma
has an additionaldecompressWithOptions(std.mem.Allocator, source: anytype, std.compress.lzma.decode.Options) !Decompress(@TypeOf(source))
creation function.zstandard
has a similar interface, the analogous entry points are calledDecompressStream
/decompressStream
and adds a comptime options argument toDecompressStream
(though it's not necessary that they are comptime and could be made runtime parameters todecompressStreamOptions
). ThedecompressStream
anddecompressStreamOptions
functions are infallible and do not return an error union, unlike the analogues for the other formats.lzma2
providesdecompress(std.mem.Allocator, reader: anytype, writer: anytype) !void
which decompresses directly into a writer rather than providing astd.io.Reader
interface.lzma
,lzma2
provide access to lower-leveldecode
namespaceszstandard
exposes adecompress
namespace which contains functions for decoding from a slice with different destinations (slice,std.ArrayList
, orstd.RingBuffer
) as well as lower-level details that can be used to build other Zstandard decompressorsdeflate
providesDecompresser()
/decompressor()
which are equivalent toDecompress()
/decompress()
ofgzip
andxz
except thatdeflate.decompressor
takes an additional dictionary parameter.I think that the names
Decompress()/decompress()
are not particularly clear on what they do, especially if non-streaming APIs are available (hence the choice to useDecompressStream
/decompressStream
forzstandard
). I think including 'stream' in the name makes sense to make it clear that these are streaming APIs.I'm not familiar with the details of the formats other than Zstandard, but does it make sense to move the initial fallible operations performed by initialisation into the
read
function for the others so alldecompress()
analogues can be non-fallible? I don't believe there is any fallible initialisation that makes sense for Zstandard and it seems silly to make the return typeerror{}!DecompressStream(...)
just to match the others.What level of detail is desired?
zstandard
exposes significantly more lower-level details than the other - enough that I think all the higher level entry points are written purely in terms of lower level functions/types that are exposed publicly. I like exposing these building blocks so that people who have a use-case where they want a more control over decompression can do it while reusing the code thatstd
needs anyway, but maybe it's not worth the API complexity?The text was updated successfully, but these errors were encountered: