Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a managed Git implementation #521

Merged
merged 125 commits into from
Nov 24, 2020
Merged

Add a managed Git implementation #521

merged 125 commits into from
Nov 24, 2020

Conversation

qmfrederik
Copy link
Contributor

@qmfrederik qmfrederik commented Oct 19, 2020

Adds a managed Git implementation to NBGV. The managed implementation is used by default but libgit2sharp is used under a switch or when writing to git is required (e.g. nbgv prepare-release).

All tests run against both backends.

This implementation already contains a couple of performance optimizations which I worked on with the help of @filipnavara ; but the performance may have regressed as I focused on matching the NBGV features.

Perhaps worth mentioning is that:

  • Opening files (i.e. File.OpenRead, but also File.Exists and just constructing the file name) turned out to be an expensive operation. On Windows, there's for example a P/Invoke call to CreateFile to implement some kind of TryOpen, to avoid having to call File.Exists followed by File.OpenRead (which is slow and racy)
  • I did not know about git pack files before starting this work, and a fair amount of the code is related to parsing git pack files in a performant manner (e.g. pooling FileStream objects which access git pack files, and caching objects retrieved from a git pack in a cache, because this can be an expensive operation).
  • For git pack files, Streams which point to the pack file are pooled instead of opening/closing them repeatedly (which is expensive).

Notable steps completed:

  • Add unit tests for the managed Git client
  • Review exceptions being thrown, make them more descriptive
  • GitCommit: Since most commits only have one/two parents, investigate whether it is worth storing the first (two) parents as a field on the GitCommit struct, and additional parents in a list. This would avoid allocating many List<GitObjectId> objects with only a single item.
  • GitCommit: Investigate whether it's worth parsing the Author value for the current HEAD only.
  • Add documentation for the managed Git client
  • Add support for Git alternates
  • Investigate whether walking the Git tree can be parallelized (deferred)
  • Test against very large pack files.
  • Test with shallow clone and fail with a better exception message than libgit2sharp fails with (an NRE)
  • Test that .ps1 scripts in nuget package still work (these scripts were removed as they are incompatible now)
  • Implement ManagedGit.GitRepository.ShortenObjectId
  • Implement GitRepository.Lookup

Closes #505
Closes #487
Closes #433
Closes #92

Improves perf as tracked by #114

@qmfrederik
Copy link
Contributor Author

@AArnott let me know how you want to proceed with this - I'm opening the PR to get early feedback. I guess could first work to get the PR to a state where it's good to merge (e.g. code quality, unit tests,...), and then stabilize further (e.g. perf work,...) on a feature/version branch?

@AArnott AArnott marked this pull request as draft October 20, 2020 16:08
@AArnott AArnott changed the title [WIP] Add a managed Git implementation Add a managed Git implementation Oct 20, 2020
@qmfrederik
Copy link
Contributor Author

I ran benchmarks on this PR, and here are the results:

Method TestData Mean Error StdDev Ratio RatioSD
GetVersionLibGit2 Cuemon;version.json 6.318 ms 0.0821 ms 0.0641 ms 1.00 0.00
GetVersionManaged Cuemon;version.json 3.188 ms 0.0667 ms 0.0742 ms 0.51 0.01
GetVersionLibGit2 NerdB(...).json [35] 23.091 ms 0.3804 ms 0.3558 ms 1.00 0.00
GetVersionManaged NerdB(...).json [35] 8.400 ms 0.1615 ms 0.1728 ms 0.36 0.01
GetVersionLibGit2 Super(...).json [24] 25.678 ms 0.5559 ms 0.6827 ms 1.00 0.00
GetVersionManaged Super(...).json [24] 3.179 ms 0.0599 ms 0.0531 ms 0.12 0.00
GetVersionLibGit2 xunit;version.json 25.977 ms 0.5188 ms 0.6176 ms 1.00 0.00
GetVersionManaged xunit;version.json 16.945 ms 0.3271 ms 0.3636 ms 0.65 0.02

@qmfrederik
Copy link
Contributor Author

@AArnott I think this PR is close to ready for review:

  • VersionOracle and BuildIntegration tests pass
  • Implemented known missing features (e.g. Git alternates)
  • Added unit tests for newly added code

One item pending your feedback is the TargetFramework - using Span<byte>-based overloads in classes like Stream requires targetting netstandard2.1/netcoreapp3.0 or polyfill code. What's the lowest TargetFramework you want to support?

Could you have a first glance at this PR and let me know what you think?

@AArnott
Copy link
Collaborator

AArnott commented Oct 25, 2020

I would prefer to multitarget in order to continue to support netcoreapp2.1/net472 while optimizing perf on netcoreapp3.1 through overrides of Span overloads.

I'll take a look at this PR later today.

Have you reviewed the various issues where we've discussed this for corner cases in git storage patterns? I can't recall where I saw it, but the libgit2sharp maintainers were warning us that there were many different cases in how packs/objects were stored in the .git folder and we would need to support all of them or else nb.gv may fail at arbitrary times in a repo when git happened to pack in that particular way.

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks exciting. Thanks so much for your work. The perf improvements look significant and the code quality looks really good too. I haven't reviewed everything, but I wanted to share what I have so far with you and get your thoughts before I continue.

src/Cake.GitVersioning/Cake.GitVersioning.csproj Outdated Show resolved Hide resolved
src/nbgv/nbgv.csproj Outdated Show resolved Hide resolved
src/NerdBank.GitVersioning/Managed/GitObjectId.cs Outdated Show resolved Hide resolved
src/NerdBank.GitVersioning/Managed/ManagedVersionOracle.cs Outdated Show resolved Hide resolved
src/NerdBank.GitVersioning/Managed/FileHelpers.cs Outdated Show resolved Hide resolved
src/NerdBank.GitVersioning/Managed/FileHelpers.cs Outdated Show resolved Hide resolved
src/NerdBank.GitVersioning/Managed/GitPackReader.cs Outdated Show resolved Hide resolved
@qmfrederik
Copy link
Contributor Author

@AArnott Thanks for the feedback! I'll try to get the framework targeting sorted out first, probably early next week.

@qmfrederik
Copy link
Contributor Author

@AArnott, re: Git Pack documentation, here are the documents I found useful:

@AArnott
Copy link
Collaborator

AArnott commented Nov 23, 2020

Thanks for that. I did write a test for the short commit id, but it's based on libgit2's (odd?) behavior of not returning anything less than 7 as a length. It's in the features/managed-git branch of this repo.
@qmfrederik If you don't want to spend the time filling in this last gap, I'll probably merge this PR and open an issue to track it and I'll try to fill it in before shipping a stable update. Just let me know, as you've been extremely helpful so far and I don't want to abuse that. :)

@qmfrederik
Copy link
Contributor Author

qmfrederik commented Nov 23, 2020

@AArnott Well, I did a quick implementation of ShortenObjectId which starts with minimum, checks whether Lookup(...) returns non-null (in which case it is well-defined). If it does, that's the short object Id. Otherwise, the length is incremented until the maximum length of 40 is reached.

I guess libgit2 does always returns 7 characters to make sure the values are somehow future-proof.

@AArnott
Copy link
Collaborator

AArnott commented Nov 23, 2020

Great.

However, in adding a couple tests I found that odd-number length partial commits produces an exception. I guess since you're converting the hex to bytes it makes searching by bytes hard when you only have half a byte prescribed. Any ideas? I wouldn't want to cause builds to fail when they specified 7 or 11 as their target commit length.

@AArnott
Copy link
Collaborator

AArnott commented Nov 23, 2020

I just added a test for odd length commit lookups by partial id and it threw the same exception.

@AArnott
Copy link
Collaborator

AArnott commented Nov 23, 2020

I'm trying my hand at a fix for this, unless you'd rather, @qmfrederik

@AArnott
Copy link
Collaborator

AArnott commented Nov 24, 2020

Got it fixed. :) Merging tonight...

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your contribution @qmfrederik is amazing. Thank you so much!

@AArnott AArnott marked this pull request as ready for review November 24, 2020 03:58
@AArnott AArnott added this to the v3.4 milestone Nov 24, 2020
@AArnott AArnott linked an issue Nov 24, 2020 that may be closed by this pull request
@AArnott
Copy link
Collaborator

AArnott commented Nov 24, 2020

And thank you @filipnavara for helping with the awesome performance improvements too!

- Build with .NET 5.0.100 SDK
  This requires also updating the NB.GV LKG version we build with.
- Build benchmarks for net5.0 as well
- Install the new SDK on agents
- Add .NET Core 3.1 runtime for testing
- Run tests on net5.0 instead of netcoreapp3.1
  While we'd rather test *both*, microsoft/MSBuildLocator#95 is likely the cause for a failure of MSBuild to locate System.Collections.dll,v5.0 when running on the .NET Core 3.1 runtime but under the 5.0 SDK.
@AArnott AArnott merged commit 9803c23 into dotnet:master Nov 24, 2020
@AArnott
Copy link
Collaborator

AArnott commented Nov 24, 2020

If all goes well, this will push to nuget.org as 3.4.142-alpha in about 30 minutes.

@qmfrederik qmfrederik deleted the features/managed-git branch November 24, 2020 07:22
@qmfrederik
Copy link
Contributor Author

Thanks, @AArnott ! It looks like the package didn't make it to nuget.org, but it's available in the CI feed, so that works for me :).

@AArnott
Copy link
Collaborator

AArnott commented Nov 24, 2020

Ya, it got snagged due to a bad service connection. That has been rectified though so it's on nuget.org now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants