Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add .sha256sum checksums to binary distribution tarballs #3605

Merged
merged 2 commits into from
Sep 14, 2023

Conversation

cormacrelf
Copy link
Contributor

Hiya. This adds a *.tar.gz.sha256 file alongside each tarball pushed to GitHub Releases. Having a hash available does not magically add much in terms of supply chain security, it just enables tools like Bazel and Buck to pull binaries and ensure they're the same one as last time. The concrete need is to generate a JSON manifest of a GitHub release including sha256 hashes, without actually downloading the tarballs and hashing them at generation time. I have not tested the pipeline, but note that:

  1. GitHub's ubuntu-latest builder includes coreutils, so the sha256sum command should be available.
  2. I checked that the * at the end of the glob will match those .sha256 suffixes as well as the tarballs, against the glob package in use. npm i -g glob; sha256sum CHANGELOG.md > CHANGELOG.md.sha256; glob '*.md*'

So I think that will work. This is not a blocker because scripts for generating manifests can always fall back on downloading the tarballs to /tmp and hashing them there. So no need to push a release just for this.

@daxpedda
Copy link
Collaborator

I've never used any of these tools, could you link some documentation that points to this being used?
But LGTM otherwise.

@cormacrelf
Copy link
Contributor Author

cormacrelf commented Sep 13, 2023

Here you go:

Note that to generate a full archive of all the releases, you have to download N*M files, N releases over M platforms, each of which weighs however many MB. It's more tolerable for wasm-bindgen than most, seeing as each weighs only a few MB.

@daxpedda
Copy link
Collaborator

I don't these contain what I was looking for.

Specifically I was trying to find some standard how to deploy these hashes in the first place. Your PR suggests to deploy them as part of the release artifacts. Is that documented somewhere? Does a tool actually make use of this path, e.g. just adding .sha256 at the end of the downloaded artifact's path to get its hash?

@cormacrelf
Copy link
Contributor Author

cormacrelf commented Sep 13, 2023

No, but you have baited me into writing apparently the first such document anywhere on the internet.

(As for tools that generate a manifest from a GH Release by fetching checksums, I am writing one at the moment. The actual scripting looks more like GET https://api.github.com/repos/WebAssembly/binaryen/releases/tags/version_115, if --checksum-algo sha256 --checksums-file FILE passed fetch that asset and parse it, iterate all asset files, hash join filenames with name + suffix for suffix in ['.sha256', '.sha256sum', 'sha1', '.sha1sum'] etc and fetch any matching sidecar files, any ones without a matching hash asset or entry in the --checksums-file go fetch the asset itself and hash it. Because the tarball filenames themselves are radically different between projects, automation stops right there. So no, no convention, just try and cover the bases. The extension itself doesn't matter because you can always configure it. These scripts are for running manually from time to time.)


There is no documented standard for how to store checksums, nor any tooling that assumes it that I can see. It would be silly to attempt one without trying to make it comprehensive and solving other goals, so it's either "a bunch of tarballs with ad hoc checksums on GH releases" or "deploy a bulletproof, CNCF-supported standard called TUF" or "TUF but for cars". Just checksums are obviously fine when you only care about integrity checking in transit and what amounts to perfect caching in build tools. But moreover, whatever checksum file location you choose is also fine. As long as there's a hash there somewhere other than freeform markdown, it can be automated in a few lines of any scripting language, so it doesn't really matter that much.

If you want details, let's compare the two main options, neither of which has any kind of standardisation for the filenames. I have listed the ability to parse with Python as a factor.

1. sha256sum *.tar.gz > checksums-sha256.txt or any other filename with all the tarballs listed in it

Examples: Redpanda puts all checksums in a rpk_23.2.8_checksums.txt file (very annoying they put the version in there) | Ubuntu puts all checksums in a SHA256SUMS file, and signs it with SHA256SUMS.gpg

Points in favour:

  • 5 tarballs don't make for 10 asset files listed. That's about it.

Points against:

  • Nobody is ever downloading all the tarballs, just one, but sha256sum --check fails if there are files listed but not present. You can run it with --ignore-missing but with non-interactive use in scripts where you want your mistakes to be revealed as errors, that may as well be "ignore failures and just go home".
  • Python one-liner is gnarly. The text format has asterisks in it depending on binary vs text mode, which is pretty vestigial and irrelevant on unix systems (something about FIPS-180-2?). So { f.removeprefix("*").removeprefix(" "): h for h, _, f in [line.strip().partition(" ") for line in checksums_file_content] }["tarball-name.tar.gz"]. But it is still simple enough.
  • Not the fault of the file format, but apparently most projects doing this call it checksums.txt. This is terrible! It doesn't say what algorithm to use. goreleaser almost had a standard, but then you get issues like this where people don't know how to verify the file and it isn't documented anywhere.

2. SIdecar files, i.e. $filename.sha256, $filename.sha256sum, etc

Examples: Neovim uses .sha256sum sidecars and also lists the hashes in the release text | About 500 rust projects using the rust-build/rust-build.action GitHub action use .sha256sum, e.g. elfshaker

Relevantly for wasm-bindgen, Trunk uses $filename.sha256 and so does Binaryen.

Points in favour:

  • No need for --ignore-missing, so well-suited to scripted use.
  • requests.get(url + ".sha256").text.split()[0] and get on with your day.

Points against:

  • Double the number of asset files.
  • Takes longer to request all the .sha256 files to generate a manifest. If you want a manifest of all releases of binaryen, even given they have hashes available to download, it still takes a while because it's N * M requests, M being the number of assets per release. checksums-sha256.txt is one asset to download per release.

Conclusion

I recommend sidecar files with $filename.sha256 or .sha256sum, on the basis that:

  • Easier and less error-prone for average folk writing a bash script to download a binary in a docker image for a CI pipeline, which is TBH the main use case.
  • Easier to script in python is a bonus, though not very important
  • The manifest script downloading all release => N * M checksums problem is IMO an anti-pattern. You want a version of wasm-bindgen, you should run the script yourself with --version 0.2.87 or whatever for your own buck2 repository, not use a buck2/prelude/wasm_bindgen/releases.bzl that has to be updated for everyone.

If you go with a single file, call it checksums-sha256.txt so that you can open it with a text editor but the algorithm is also self-documenting.

.github/workflows/main.yml Outdated Show resolved Hide resolved
Co-authored-by: daxpedda <[email protected]>
Copy link
Collaborator

@daxpedda daxpedda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for going through all the effort!

@daxpedda daxpedda merged commit 26e8377 into rustwasm:main Sep 14, 2023
25 checks passed
@cormacrelf cormacrelf changed the title add .sha256 checksums to binary distribution tarballs add .sha256sum checksums to binary distribution tarballs Sep 14, 2023
@cormacrelf cormacrelf deleted the sha256-checksums branch September 14, 2023 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants