Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The documented .git_archival.txt causes git archives to change hash after first post-release commit is made #806

Closed
mgorny opened this issue Feb 23, 2023 · 16 comments · Fixed by #1033

Comments

@mgorny
Copy link
Contributor

mgorny commented Feb 23, 2023

The documentation suggests using the following content for the .git_archival.txt file:

node: $Format:%H$
node-date: $Format:%cI$
describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
ref-names: $Format:%D$

This means that if a tag is created on top of the main branch, the file initially contains e.g.:

ref-names: HEAD -> main, tag: v1

but once another commit is added and main no longer corresponds to the tag, it changes to:

ref-names: tag: v1

This causes the git archive hash to change. This is a problem for distributions like Gentoo that are using git archives generated by GitHub. The initial archive that we fetch once the release is made changes once upstream makes new commits.

See e.g.: https://bugs.gentoo.org/895910, https://bugs.gentoo.org/895712

@RonnyPfannschmidt
Copy link
Contributor

Thanks for the heads up

Is there a suggested way to get that data

@mgorny
Copy link
Contributor Author

mgorny commented Feb 23, 2023

I don't really know. I didn't even know such a thing is possible before setuptools_scm implemented it ;-).

@RonnyPfannschmidt
Copy link
Contributor

Based on the git docs it's impossible to opt out of the problem details

Is there any reason why you don't use the release artifacts from pypi?

@ionenwks
Copy link

ionenwks commented Feb 23, 2023

Is there any reason why you don't use the release artifacts from pypi?

In a lot of cases it's because they're missing something, e.g. tests (Edit: otherwise Gentoo does prefer pypi tarballs). Ideally everyone would include them but it's not always so easy.

@RonnyPfannschmidt
Copy link
Contributor

That's unfortunate, in particular since setuptools_scm typically enforces that all files go in

@mgorny
Copy link
Contributor Author

mgorny commented Feb 23, 2023

Yes, it is. It's been last debated in https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578, with no definite conclusion. Unfortunately, some people believe that "sdist is only for pip" and so including files that pip (or a PEP517 build system, more generally) doesn't use is a waste of space.

@webknjaz
Copy link
Member

Unfortunately, some people believe that "sdist is only for pip" and so including files that pip (or a PEP517 build system, more generally) doesn't use is a waste of space.

@mgorny I've been pointing people at my action https:/re-actors/checkout-python-sdist to try to bring better standard into the ecosystem FYI. Projects that would use this, would probably be more downstream-friendly, as a result.

@eli-schwartz
Copy link

Isn't the need for ref-names obsoleted by the availability of the describe-name I added to current versions of git (and which is specifically designed to be stable, at least if abbrev is used)?

Maybe the template recommendation can simply phase out the old and unreliable ref-names formatter?

@maresb
Copy link

maresb commented Mar 29, 2023

I have the impression that describe-name is relatively new and not so widespread yet. Specifically, I've noticed often warnings about describe-name being unavailable. When I locally build the aesara package under Ubuntu 20.04 (git 2.39.2), I get

* Building wheel...
/tmp/build-env-u3aklikk/lib/python3.11/site-packages/setuptools_scm/git.py:295: UserWarning: git
archive did not support describe output

Thus I fear that in practice removing ref-names could break a lot of workflows.

@eli-schwartz
Copy link

When locally building from a git repository the primary data source should be git itself, not the .git_archival.txt file... I wonder why this is triggering a warning.

@maresb
Copy link

maresb commented Mar 29, 2023

Yes, that's a very good question, and I'd like to look into that. But in any case, I'm still concerned about the prevalence of Git versions which don't support describe-name.

@eli-schwartz
Copy link

In theory it should only matter in cases where you are not building from a git checkout, and you're not building from a PyPI sdist, and you are only building from the output of running git archive.

The main situation where that happens is on servers such as github.com, gitlab.com, git.sr.ht, codeberg.org, etc. and those tend to roll out new git updates much more often than new Ubuntu LTSes.

It is a good question, how often that will cause issues for local workflows. But I'm hoping the answer might be "not very".

@mgorny
Copy link
Contributor Author

mgorny commented Mar 29, 2023

I also find it hard to imagine a use case for using a git archive locally. I always thought the main purpose of this feature is to support people fetching archives generated by GitHub, etc.

@RonnyPfannschmidt
Copy link
Contributor

It's definitely time to go down the road of slimmer data there

Just recently I experimented with using the apis better

The new way is definitely the way to go

It still may be helpful to have ref names if limit to Tags is possible for broken forges

But those need to come forward

@dvzrv
Copy link

dvzrv commented Apr 14, 2024

We have been seeing this issue in quite a few upstreams when packaging for Arch Linux by now and it significantly increases contributor time spent on figuring out what is going on and communicate with upstreams each time the reproducibility of a package is broken.

I suggest to either entirely remove the problematic .git_archival.txt documentation or to at a minimum add a big warning leading to this ticket and pointing out that it will make auto-generated source tarballs (and seemingly also tags?) not reproducible.

Is there any reason why you don't use the release artifacts from pypi?

  • VCS sources or auto-generated source tarballs are more transparent for auditing purposes.
  • sdist tarballs are not really well defined and often miss the stuff we need (tests, licenses, etc.)
  • sdist tarballs are created in various environments
  • backporting patches for setup.py/setup.cfg in sdist tarballs is a terrible experience
  • we are now specifically encouraging the use of upstream provided source tarballs/ VCS sources (https://rfc.archlinux.page/0020-sources-for-python-packaging/)

@LecrisUT
Copy link
Contributor

LecrisUT commented Apr 16, 2024

It should be documented which git hosts support describe-name and if they do simply remove ref-names. For Github projects just get rid of it.

The downsides of ref-names is arguably major enough that the project should not use setuptools-scm if the git host does not support descibe-name by now.

burgholzer added a commit to cda-tum/mqt-core that referenced this issue Jun 5, 2024
webknjaz added a commit to webknjaz/attrs that referenced this issue Aug 23, 2024
Some time ago, it was discovered that Git archives having
`ref-names: $Format:%D$` in `.git_archival.txt` may change when
references existing in the repository change over time [[1]]. This
means that downloading an archive for a commit from an immutable URL
may start yielding slightly different results. This hurts the ability
of downstreams to source projects from Git archive URLs.

With that in mind, modern `setuptools-scm` advises against having this
entry in the `.git_archival.txt` template [[2]]. And this patch
implements said recommendation.

[1]: pypa/setuptools-scm#806
[2]: https://setuptools-scm.readthedocs.io/en/latest/usage/#git-archives
webknjaz added a commit to webknjaz/attrs that referenced this issue Aug 23, 2024
Some time ago, it was discovered that Git archives having
`ref-names: $Format:%D$` in `.git_archival.txt` may change when
references existing in the repository change over time [[1]]. This
means that downloading an archive for a commit from an immutable URL
may start yielding slightly different results. This hurts the ability
of downstreams to source projects from Git archive URLs.

With that in mind, modern `setuptools-scm` advises against having this
entry in the `.git_archival.txt` template [[2]]. And this patch
implements said recommendation.

[1]: pypa/setuptools-scm#806
[2]: https://setuptools-scm.readthedocs.io/en/latest/usage/#git-archives
github-merge-queue bot pushed a commit to python-attrs/attrs that referenced this issue Aug 24, 2024
Some time ago, it was discovered that Git archives having
`ref-names: $Format:%D$` in `.git_archival.txt` may change when
references existing in the repository change over time [[1]]. This
means that downloading an archive for a commit from an immutable URL
may start yielding slightly different results. This hurts the ability
of downstreams to source projects from Git archive URLs.

With that in mind, modern `setuptools-scm` advises against having this
entry in the `.git_archival.txt` template [[2]]. And this patch
implements said recommendation.

[1]: pypa/setuptools-scm#806
[2]: https://setuptools-scm.readthedocs.io/en/latest/usage/#git-archives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants