Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't clone submodules for tools that aren't being built #76653

Closed
jyn514 opened this issue Sep 12, 2020 · 13 comments
Closed

Don't clone submodules for tools that aren't being built #76653

jyn514 opened this issue Sep 12, 2020 · 13 comments
Labels
A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself C-feature-request Category: A feature request, i.e: not implemented / a PR. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@jyn514
Copy link
Member

jyn514 commented Sep 12, 2020

Right now, this is the first thing you see on a fresh clone:

$ python x.py check
Updating only changed submodules
Updating submodule src/tools/rust-installer
Submodule 'src/rust-installer' (https:/rust-lang/rust-installer.git) registered for path 'src/tools/rust-installer'
Cloning into '/home/joshua/src/rust/rust/src/tools/rust-installer'...
remote: Enumerating objects: 30, done.        
remote: Counting objects: 100% (30/30), done.        
remote: Compressing objects: 100% (26/26), done.        
remote: Total 799 (delta 9), reused 11 (delta 3), pack-reused 769        
Receiving objects: 100% (799/799), 246.22 KiB | 4.83 MiB/s, done.
Resolving deltas: 100% (486/486), done.
Submodule path 'src/tools/rust-installer': checked out 'd66f476b4d5e7fdf1ec215c9ac16c923dc292324'
Updating submodule src/doc/nomicon
...

This can go on for many minutes, especially on a slow connection. Instead, x.py should only clone the submodules when they're actually needed.

I think cargo build requires having the Cargo.toml of the submodules to work, which might be tricky ... maybe x.py could default to a shallow clone? The main thing I'd love to have is for x.py check to not have to download LLVM.

Thanks to @Lokathor and @thomcc for encouraging me to throw away my clone and start from scratch ;)

@jyn514 jyn514 added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Sep 12, 2020
@Mark-Simulacrum
Copy link
Member

Many of these must be cloned with the current global workspace, because otherwise cargo will fail to resolve the workspace. I think that may be true for all of them - or at least all cargo projects. Once downloadable LLVM lands we can skip that though.

@Mark-Simulacrum
Copy link
Member

@ehuss @Eh2406 -- I think you've both been working on Cargo's resolver recently. How feasible would it be to explicitly tell Cargo "this crate, and its dependencies, should be presumed unchanged" when re-resolving a lockfile? My guess is it's quite hard.

Currently we get something like this whenever a submodule is just not yet initialized:

error: failed to read `/home/mark/Build/rust/src/tools/cargo/Cargo.toml`

Caused by:
  No such file or directory (os error 2)

One "solution" to this perhaps is to duplicate submodule's Cargo.toml's in-tree but stubbed out and pointing at empty crates (basically just so resolver works). But I'd really prefer to avoid doing that, and it does seem like Cargo should have all the information already available. We could plausibly arrange for x.py to fetch just the Cargo.toml's for these submodules, I guess, though that also seems painful.

@jyn514 jyn514 added the A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself label Sep 13, 2020
@ehuss
Copy link
Contributor

ehuss commented Sep 13, 2020

I think it will be quite difficult, since there are a few other things in Cargo that expect the manifests and files to be there.

Do the cargo-based projects really matter performance-wise? It seems like submodule stuff is completely dominated by llvm. On my system, it looks like it is 3.5 times larger than all the others added together.

Also, rust-by-example is unnecessarily huge. I think squashing the gh-pages branch in that repo would shrink it dramatically.

The next largest is rust-analyzer. That's more feasible to ignore, since it is in its own little world. Unfortunately it has node-modules checked in, which is quite huge.

What is the status of using git subtree?

@mati865
Copy link
Contributor

mati865 commented Sep 13, 2020

What is the status of using git subtree?

AFAIK patch to subtree is still not upstream so you need to patch and build it locally to create subtree. cc @flip1995

@jyn514
Copy link
Member Author

jyn514 commented Sep 13, 2020

I agree LLVM is the main issue - it would be nice to have the rest downloaded on-demand, but it's not dramatically hurting setup times I think.

Also, rust-by-example is unnecessarily huge. I think squashing the gh-pages branch in that repo would shrink it dramatically.

👍 sounds easy enough to do

@flip1995
Copy link
Member

git-subtree is only really feasible, when the git PR (gitgitgadget/git#493) gets merged and released. Also Clippy is waiting for this rustup feature to get released: rust-lang/rustup#2438

@jyn514
Copy link
Member Author

jyn514 commented Jan 29, 2021

Also, rust-by-example is unnecessarily huge. I think squashing the gh-pages branch in that repo would shrink it dramatically.

Do you have suggestions for how to do this? I tried https://stackoverflow.com/a/1661283/7669110 locally, which worked in the sense that the history is now one commit, but didn't decrease the size of .git at all even after running git gc --aggressive.

@bjorn3
Copy link
Member

bjorn3 commented Jan 29, 2021

You probably still have the remote branch. Try removing .git/refs/remotes/origin/gh-pages and then running git gc --aggressive.

@ehuss
Copy link
Contributor

ehuss commented Jan 29, 2021

I think the issue you created looks correct. It needs someone with write permissions to that repo to do it directly. As with anything git-related, there are many ways to accomplish the same task. If it is wrong, worst case we can rebuild the branch. I don't think there is any history there that is important, and at least I have a local backup copy.

I did this before with some repo (I can't remember which). I think GitHub will gc pretty quick (probably within 24 hours), though I imagine they have heuristics on how often that happens.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jan 30, 2021
Don't clone LLVM submodule when download-ci-llvm is set

Previously, `downloading_llvm` would check `self.build` while it was
still an empty string, and think it was always false. This fixes the
check.

This addresses the worst part of rust-lang#76653. There are still some large submodules being downloaded (in particular, `rustc-by-example` is 146 MB, and all the submodules combined are 311 MB), but this is a lot better than the whopping 1.4 GB before.
@jyn514
Copy link
Member Author

jyn514 commented Jan 30, 2021

Hmm, #81520 doesn't help with this as much as I thought because you have to make a config.toml before ever running a command with x.py. I wonder if we could clone LLVM lazily instead (only when it needs to be built).

@jyn514
Copy link
Member Author

jyn514 commented Jan 30, 2021

@Mark-Simulacrum what do you think about doing this in impl Step for native::Llvm instead? That has a couple benefits:

  • Things that don't need LLVM at all don't need complicated logic to detect that (e.g. x.py check or x.py setup)
  • download-ci-llvm no longer gets special cased, native::Llvm isn't run in the first place when it's set
  • Running x.py check out of the gate no longer downloads LLVM

@Mark-Simulacrum
Copy link
Member

That seems largely fine to me, presuming it doesn't duplicate too much code or so (I wouldn't expect it to).

bors added a commit to rust-lang-ci/rust that referenced this issue May 23, 2021
Move llvm submodule updates to rustbuild

This enables better caching, since LLVM is only updated when needed, not
whenever x.py is run. Before, bootstrap.py had to use heuristics to
guess if LLVM would be needed, and updated the module more often than
necessary as a result.

This syncs the LLVM submodule only just before building the compiler, so
people working on the standard library never have to worry about it.
Example output:

```
Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Updating submodule src/llvm-project
Submodule 'src/llvm-project' (https:/rust-lang/llvm-project.git) registered for path 'src/llvm-project'
Submodule path 'src/llvm-project': checked out 'f9a8d70b6e0365ac2172ca6b7f1de0341297458d'
```

Implements rust-lang#76653 (comment). This could be easily extended to other submodules, like `rust-by-example` and `rustc-dev-guide`, which aren't needed for cargo's workspace resolution.
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 21, 2021
…imulacrum

Update all submodules that rustbuild doesn't depend on lazily

This only updates the submodules the first time they're needed, instead
of unconditionally the first time you run x.py.

Ideally, this would move *all* submodules to rustbuild and not exclude some tools and
backtrace. Unfortunately, cargo requires all `Cargo.toml` files in the
whole workspace to be present to build any crate.

On my machine, this takes the time for an initial submodule clone (for
`x.py --help`) from 55.70 to 15.87 seconds.

Helps with rust-lang#76653. Builds on rust-lang#86015 and should not be merged before (only the last commit is relevant).
@jyn514
Copy link
Member Author

jyn514 commented Oct 11, 2021

Update: since #82653, only the following submodules are cloned:

library/backtrace
library/stdarch
src/tools/cargo
src/tools/miri
src/tools/rls
src/tools/rust-installer

Those take up a total of 67 M, most of them in miri, RLS, and cargo:

$ du -d2 -h .git/modules/ | sort -h
472K	.git/modules/src/rust-installer
1.5M	.git/modules/library/backtrace
13M	.git/modules/library/stdarch
14M	.git/modules/library
52M	.git/modules/src/tools
53M	.git/modules/src
67M	.git/modules/
$ du -d1 -h .git/modules/src/tools/ | sort -h
8.7M	.git/modules/src/tools/miri
9.5M	.git/modules/src/tools/rls
34M	.git/modules/src/tools/cargo

Given that .git is 795 MB overall I'm going to go ahead and close this issue in favor of #63978.

@jyn514 jyn514 closed this as completed Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself C-feature-request Category: A feature request, i.e: not implemented / a PR. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

No branches or pull requests

6 participants