Explore using Nerdbank.GitVersioning's `ManagedGit` implementation for reading Git repositories #343

mscottford · 2021-07-06T14:22:29Z

As documented in #243, there going to be some benefits to moving away from libgit2sharp and towards a different solution.

The Nerdbank.GitVersioning project has experimented with this in the past as well. Their discussion in dotnet/Nerdbank.GitVersioning#505 is a very interesting read. Pretty much all of the objections that I have to us continuing to use libgit2sharp are well stated in that discussion. While reading through what they implemented as part of dotnet/Nerdbank.GitVersioning#521, I realized that their implementation is marked public, which means we can try using it for our needs.

Since their intended use for the git repository is very similar to ours (just reading through the history with no interest in making changes/commits to the repository), then I think that there's a really good chance that we could use their code to pull out the contents of dependency manifest files along with their history.

Our current usage of git requires us to make a clone of the repository if it doesn't exist already. The Nerdbank.GitVersioning ManagedGit code does implement the git clone command because for its use case it makes sense to assume that they already have access to a clone. So that's something that we'd have to build ourselves.

Our current usage also assumes that the history is stored directly on the filesystem, and that's something that I'd love to break our dependency on. We could try to avoid the need to rely on the filesystem by instead implementing our own object/pack storage mechanism in memory (similar to what the C library libgit2 and Go library go-git support).

Since we're going to have to build our own equivalent of the clone command and the git data transfer protocols, starting out with support for performing a clone operation directly into an in-memory store is something that we should consider including.

It's worth noting that since libgit2 has functionality for using custom Git object storage mechanisms, it might be exposed via libgit2sharp as well. Even if this is the case, I'm still not a fan of us depending on libgit2sharp, if we can avoid it. I'd rather keep our dependency on the filesystem and remove our dependency on libgit2sharp than the other way around.

Implementing our own clone command does open other possibilities that are worth considering. We could make our implementation smart enough to only grab objects, packs, commits, and trees that contain the files that we're interested in reading. We don't need all of the source code, just the dependency manifest information. This would potentially save us a bunch of time that we're currently spending waiting for a full git clone command to complete. If we could walk the commits on the remote to find ones that reference dependency manifest files, then we could just request the objects/packs that contain the versions of those files that we need. This would result in a much smaller data transfer in terms of the raw number of bytes. It is possible that the extra processing that we'd have to do would negate any performance benefit as measured in seconds. We could do some profiling to assess that, though. And I suspect that transferring less data would be a big win for really large repositories.

There is a potential risk that needs to be noted if we go forward with this idea. The Nerdbank.GitVersioning team might not be excited to learn that we plan on using their ManagedGit code directly. They might react by marking those classes internal. The discussion in dotnet/Nerdbank.GitVersioning#505 included some back-and-forth about where the ManagedGit implementation should live, with one of the options being to move it into a separate package. Perhaps that's an extraction effort that we could assist them with in the event that they object to us using Nerdbank.GitVersioning as a dependency just for the purpose of consuming the ManagedGit code that it contains.

The text was updated successfully, but these errors were encountered:

mscottford · 2021-07-06T14:26:06Z

I've also discovered that JGit has support for performing an in-memory clone operation. https:/centic9/jgit-cookbook/blob/master/src/main/java/org/dstadler/jgit/porcelain/CloneRemoteRepositoryIntoMemoryAndReadFile.java

mscottford · 2021-10-07T12:27:33Z

I think that a good start in this direction would be to add an alternative to LibGit2Sharp instead of just replacing our current implementation. It could be behavior that gets turned on via an environment variable, or we could even leverage https:/scientistproject/Scientist.net to run both libraries in parallel.

mscottford · 2022-09-08T12:14:45Z

Freshli-CLI is wrapping the Git executable directly.

rcdailey · 2022-11-13T16:39:08Z

@mscottford Can you explain a bit further? Do you mean that you execute shell commands from C# code? Do you rely on Git being available on the system already?

mrbiggred added the design For design discussion issues label Jul 14, 2021

mscottford closed this as completed Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore using Nerdbank.GitVersioning's `ManagedGit` implementation for reading Git repositories #343

Explore using Nerdbank.GitVersioning's `ManagedGit` implementation for reading Git repositories #343

mscottford commented Jul 6, 2021

mscottford commented Jul 6, 2021

mscottford commented Oct 7, 2021

mscottford commented Sep 8, 2022

rcdailey commented Nov 13, 2022

Explore using Nerdbank.GitVersioning's ManagedGit implementation for reading Git repositories #343

Explore using Nerdbank.GitVersioning's ManagedGit implementation for reading Git repositories #343

Comments

mscottford commented Jul 6, 2021

mscottford commented Jul 6, 2021

mscottford commented Oct 7, 2021

mscottford commented Sep 8, 2022

rcdailey commented Nov 13, 2022

Explore using Nerdbank.GitVersioning's `ManagedGit` implementation for reading Git repositories #343

Explore using Nerdbank.GitVersioning's `ManagedGit` implementation for reading Git repositories #343