Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading of a git source is slow due to always checking if the submodule is updated #14603

Open
epage opened this issue Sep 26, 2024 · 5 comments · May be fixed by #14605
Open

Loading of a git source is slow due to always checking if the submodule is updated #14603

epage opened this issue Sep 26, 2024 · 5 comments · May be fixed by #14605
Labels
A-git Area: anything dealing with git C-enhancement Category: enhancement Performance Gotta go fast! S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review

Comments

@epage
Copy link
Contributor

epage commented Sep 26, 2024

When you have some git dependencies or patches, loading them takes a long time

For the traces from #14238, notice how slow the first resolve is compared to the second. A lot of the time is taken up checking each git dependencies submodules
image

Note: this slow down is most relevant for tooling like rust-analyzer or rust-defined completions when it may be called in interactive contexts without any compilation happening or "cargo script" where cargo build is performed every time you run your script.

Related: #14395

@epage epage added A-git Area: anything dealing with git C-enhancement Category: enhancement Performance Gotta go fast! S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review labels Sep 26, 2024
@epage
Copy link
Contributor Author

epage commented Sep 26, 2024

@epage
Copy link
Contributor Author

epage commented Sep 26, 2024

When we GitDatabase::copy_into a checkout, we skip it if the checkout is "fresh"

// If the existing checkout exists, and it is fresh, use it.
// A non-fresh checkout can happen if the checkout operation was
// interrupted. In that case, the checkout gets deleted and a new
// clone is created.
let checkout = match git2::Repository::open(dest)
.ok()
.map(|repo| GitCheckout::new(self, rev, repo))
.filter(|co| co.is_fresh())
{
Some(co) => co,
None => GitCheckout::clone_into(dest, self, rev, gctx)?,
};

We then unconditionally check for submodules
checkout.update_submodules(gctx)?;

submoduels should be as immutable as the checkout and we should be able to guard the whole process with the freshness check.

The main risk in doing that is if you switch back and forth between old cargo and Cargo with the submodules covered by the freshness check and the submodule update is interrupted. New cargo will see that the freshness check passxes and assumes the submodule is valid and goes on its way. We could do extra fancy things with the file used for this like in #14395 but it doesn't seem worth it.

@epage
Copy link
Contributor Author

epage commented Sep 26, 2024

CC @osiewicz

@osiewicz
Copy link
Contributor

osiewicz commented Sep 27, 2024

It seems like old cargo will "heal" itself from the corrupted checkout, as it'll remove it and refetch everything. Given that, I would agree that we could just proceed as is? Alternatively, could we bundle submodule freshness check with the checkout freshness check?

osiewicz added a commit to osiewicz/cargo that referenced this issue Sep 27, 2024
@osiewicz osiewicz linked a pull request Sep 27, 2024 that will close this issue
@epage
Copy link
Contributor Author

epage commented Sep 27, 2024

It seems like old cargo will "heal" itself from the corrupted checkout, as it'll remove it and refetch everything. Given that, I would agree that we could just proceed as is? Alternatively, could we bundle submodule freshness check with the checkout freshness check?

The corner case is

  1. With old cargo
  2. See lack of freshness file
  3. Fetch commit
  4. Checkout commit
  5. Write freshness file
  6. Perform submodule update and get interrupted
  7. With new cargo
  8. See freshness file and assume commit is fetched, checked out, and submodules are updated, bypassing all other steps

This kind of quick back and forth is likely when working directly with Cargo and an IDE running Cargo in the background.

That said, I think we would be fine with this. This will be limited to a specific set of cargo versions run in a specific order, with an interruption. Over time, the combination of versions will become less likely.

If we did something else about this, some potential mitigations

  • For a transition period, always do a submodule update but write the file after it
  • Store in the file whether the submodule update happened
  • Use a second file this would cause problems with old Cargo's reading the check out iiuc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-enhancement Category: enhancement Performance Gotta go fast! S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants