Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: cache link output binaries in the build cache #69290

Open
matloob opened this issue Sep 5, 2024 · 11 comments
Open

proposal: cmd/go: cache link output binaries in the build cache #69290

matloob opened this issue Sep 5, 2024 · 11 comments
Labels
Milestone

Comments

@matloob
Copy link
Contributor

matloob commented Sep 5, 2024

Proposal Details

This proposal is for cmd/go to cache the binary outputs of link actions in builds to the build cache. Binary outputs would be trimmed from the cache earlier than package outputs.

cmd/go currently caches the outputs of compiling a package (build actions) in the build cache, but does not cache the outputs of linking a binary (link actions) in the build cache:

// Cache package builds, but not binaries (link steps).

The primary reasons binaries are not cached are that built binaries are much larger than individual package object files and they are not reused as often. We would mitigate that by trimming binaries with a shorter trimLimit than we currently use for the package objects in the cache: we currently remove a package output if it hasn't been used for five days, but we would perhaps choose two days for binaries. To make it easy to identify binaries for trimming, we would store them in a different location than package objects: perhaps instead of $GOCACHE// they would be stored in $GOCACHE/exe//.

We would also need to figure out what to do about the potential for ETXTBSY issues trying to execute the built binaries: see #22220. If the go command tries to write to a binary and then execute it we can get errors executing the binary. We'll have to figure out what to do about this because we would need to write the build id into the binary and then execute it, if we're doing a go run.

cc @rsc @samthanawalla

@gopherbot gopherbot added this to the Proposal milestone Sep 5, 2024
@ianlancetaylor
Copy link
Contributor

How often do we think the cache would be used for a binary? Pretty much any change to the code of any of the inputs will require a relink. When would this save time in practice?

@ianthehat
Copy link

I think the main use case is go run package@version which will always produce the same binary.

Could we consider a total space ejection policy rather than a time based one? (drop oldest from cache when we try to add something to cache that pushes it over the limit)

@matloob
Copy link
Contributor Author

matloob commented Sep 6, 2024

Yes, I think the main use would be for go run package@version. But I think it could also be useful for go run package. This can be useful for tools used by projects such as those in tools.go and it would also be useful for the new tool feature being implemented for go 1.24 (#48429).

@ianlancetaylor
Copy link
Contributor

Thanks. That at least raises the possibility of using the build cache for go run but not for go build or go install.

@ConradIrwin
Copy link
Contributor

One thing I'd like for this is if the package name showed up in the output of ps.

I think that means that we should store binaries like stringer on disk in a directory: $GOCACHE/exe/<ha>/<hash>/stringer, though it may also be OK to do something like $GOCACHE/exe/<hash>-stringer if there's a lot of overhead per directory; so that if a tool is misbehaving I can ps ax | grep stringer to find it.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/613095 mentions this issue: cmd/go: prototype of binary caching

@matloob
Copy link
Contributor Author

matloob commented Sep 13, 2024

I've put together a basic prototype (caches all binaries, doesn't use package name or ExeName as file name) at golang.org/cl/613095

@adonovan
Copy link
Member

Does this need a proposal? It seems to be a mere implementation detail that shouldn't affect the observable behavior, except quantitatively.

(I suppose @ConradIrwin's point that "go run" processes might notice their argv[0] name has changed is a counterargument but @matloob has an implementation that avoids that by using a subdirectory <hash>/stringer.)

@mvdan
Copy link
Member

mvdan commented Sep 19, 2024

ETXTBSY might not be an issue on Linux for too long: #22315 (comment)

@ianlancetaylor
Copy link
Contributor

There are already people who complain about the size of the build cache (e.g., #68872), so I do think this is more than an implementation detail.

@ConradIrwin
Copy link
Contributor

The original go tool proposal had us caching the outputs not based on the build graph, but on the current module path + tool package path.

For the go tool use-case, it doesn't make much difference. If the key is based on build graph then you'll end up with multiple copies in the cache when the dependencies change (which is rare), but you reduce the number of copies if you are actively working on multiple modules that use the same tool at the same version (which is rare).

That said, I would ideally like a longer cache expiry time for tools than 2 days. It's very common for people to take weekends off (and even long weekends happen occasionally). 5 days is actually quite good for the use-case of "stop working on Thursday afternoon and pick it up again on Tuesday morning".

If we're changing to cache go run too as a part of this change, it's more of an issue. It's very common to make changes to a module and then continually go run it. If we use a build graph based cache key we'll accumulate builds that will almost never be re-used; if we chose a cache key based on current module + package path then we only ever have the latest version, which would significantly reduce the cache size, and probably doesn't affect the cache hit rate very much (only if someone "undoes" to a previous state of the code, which while nice to be fast is not a very common case).

To be clear, if we're not changing the caching behavior of go run, then it doesn't matter either way; if we are changing that behavior, then I think we should deliberately choose our cache keys to ensure that only the latest copy of "the same" binary is kept. (And I think current module path + package path is a reasonable definition of "the same").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

7 participants