Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes #1374

Merged
merged 51 commits into from
May 22, 2024
Merged

Various fixes #1374

merged 51 commits into from
May 22, 2024

Conversation

Byron
Copy link
Member

@Byron Byron commented May 22, 2024

Tasks

Original Checklist from private PR

This PR lays down the infrastructure to further validate tree-names before they are used, and make sure validation happens in the right places.

That way, writes to forbidden locations inside or outside of the repository should be prevented.

Tasks

References

Byron and others added 30 commits May 17, 2024 17:32
----

Note that this commit also streamlines obtaininig a relative
path for a directory, which previously could panic.
That way it's easier to assure that forbidden names are never used
as part of path components.
…ding paths.

This way, everyone using the stack with the purpose of
altering the working tree will run additional checks to prevent callers
from sneaking in forbidden paths.

Note that these checks don't run otherwise, so one has to be careful
to not forget to run these checks whenever needed.
That way, detailed information about the path-to-be is available not
only for evaluating attributes or excludes, but also for validating
path components (in this case, relevant for `.gitmodules`).
…ctNTFS`.

This also adds `gitoxide.core.protectWindows` as a way to enforce
additional restrictions that are usually only available on Windows.

Note that `core.protectNFS` is always enabled by default, just like
[it is in Git](git/git@9102f95).
It's probably best not to try to protect against violations of constraints
in this free-to-mutate data-structure and instead suggest to validate entry
paths before using them on disk (or use the `gix_worktree::Stack`).
…otectWindows` is enabled.

Note that trailing `.` are forbidden for some reason, but trailing ` ` (space) is forbidden
as it's just ignored when creating directories or files, allowing them to be clobbered
and merged silently.
This should not be incorporated into automated tests in its current
form. It is a proof of concept to generate repositories that attempt
to install real executables in directories where they may be run,
whereas test fixtures should completely limit all effects to testing
directories, even in the event of regressions or unexpected failures.
Because some sed implementations, at least the one on macOS, detect
invalid text in the current locale's encoding and error out. See:

https://stackoverflow.com/questions/19242275/re-error-illegal-byte-sequence-on-mac-os-x

This makes the script work on macOS.
Because committing the staged paths creates the necessary Git tree
objects irrespective of what directories exist or are otherwise
represented.

In addition to simplifying the proof-of-concept repository, this
also makes it so its entries are properly ordered in its Git object
database, so `git fsck` does not report errors about that, and
exits reporting success (though of course still warns about the
presence of `..` components).
Because the output of `git commit` should show that information.
The repo this script makes attempts to check out entries traversing
the default `$INDEX_ALLOCATION` directory stream of the `.git`
directory, whose stream name is documented to be `$I30`.

However, although I am able to access directories under this naming
scheme through other applications, the repositories this script
currently creates do not appear to trigger the bug in gitoxide. The
next step is to try specifying the stream type explicitly.
This seems more effective at revealing such a vulnerability.

I don't know why, since both should in principle work fine.
The repo the script makes contains a filename with slash characters
in it that, if not rejected, will install a pre-commit hook.
This requires xxd now, but it honors its /bin/sh hashbang line, no
longer assuming printf understands \xNN in a format string.
This is needed on some Git versions. It seems it was not needed
on older versions, even though their git-fsck detected the unusual
filenames when run. It is supported even on those older versions,
so the script should still run on them.
The repo this script makes contains a filename with a slash character
in it that, if not rejected, will create a file above the working tree.

This is a modification of make_traverse_dotgit_slashes.sh. Both
require some further revision, and since most of their content is
duplicated, it may be worthwhile to combine them to avoid that.
- assure `con` is checked for, and that it's not overzealous.
- reduce code duplication
- improve documentation about more obscure parts of the code,
  based on the description in [this commit](git/git@e7cb0b4)
- upper-case device names in comparisons as this is their canonical form, which also is more recognizable for people who
  are looking for them.
- make clear why there is asymmetry between COM and LPT numbers.
- Don't make a partial control-character check, but a complete one (i.e. *b < 32|0x20)
- Add more variants for stream type tests (as regression protection, the code doesn't really care)
- various clarifications in path-related tests on Windows

Co-authored-by: Eliah Kagan <[email protected]>
Byron and others added 4 commits May 22, 2024 10:01
Otherwise it's possible to read or write to devices when interacting
with references of the 'right' name.

This behaviour can be controlled with the new `prohibit_windows_device_names` flag,
which is adjustable on the `Store` instance as field, and which now has to be
passed during instantiation as part of the new `store::init::Options` struct.
@Byron
Copy link
Member Author

Byron commented May 22, 2024

@EliahKagan If you take a look at the CI failure on Windows you will see that the error there is due to a lock file that couldn't be obtained. I think this is akin to what Git does. The problem here is that it shouldn't even get to that point - it should fail when trying to read the reference, which I think it might actually do here but without issue. Unfortunately, I couldn't reproduce it in the VM, the test succeeds, but I will keep trying with added debug information.

@Byron
Copy link
Member Author

Byron commented May 22, 2024

Turns out I was blind! CI is actually reproducing the correct issue, and the problem was that the error has handled too late.

@Byron Byron force-pushed the various-fixes branch 2 times, most recently from 7384d8b to 5f1c865 Compare May 22, 2024 10:43
@Byron
Copy link
Member Author

Byron commented May 22, 2024

Alright, I think CI will pass now, and once done, a new gitoxide release will be created.

@EliahKagan
Copy link
Member

EliahKagan commented May 22, 2024

When I run the tests locally on Windows, at 5f1c865, this still fails:

        FAIL [   0.846s] gix-archive::archive from_tree::basic_usage_zip

The same thing seems to be happening on CI for it, with these details.

This started working with the upgradde of the `zip` crate.
@Byron
Copy link
Member Author

Byron commented May 22, 2024

Oh, that's a new failure probably due to Windows not really having symlinks. Maybe there is a way to support that better, but for now it's not to be expected to be a symlink on Windows.
It's interesting as in an older version of the zip crate, symlinks didn't work at all, so that's an improvement.

Copy link
Member

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I have installed on Windows from e955770 with cargo install --path . and repeated the manual tests of the refs-related features that had not been fully implemented yet when I had tested them yesterday. Those are working now.

Once this passes CI, I think it can be merged. It seems to me that the pull request description should eventually be updated with the other relevant checklist items, but that this could be done later (which might even be preferable).

@RivenSkaye
Copy link

RivenSkaye commented May 22, 2024

Symlinks on Windows do exist, but currently they're locked behind one of two options:

  1. Admin only to create and edit
  2. Developer Options allow you to toggle making them free for all

The latter is also what you need to have a hassle-free symlinks=native-strict experience in MSYS2/cygwin envs. More info on this 2016 windows developer blog article. For most intents and purposes, they're not reliably present.

Edit:
Current win32 docs and the corresponding page for the function in the headers seem to indicate Developer Mode is now only required for UWP and makes no mention of perms anymore

@Byron Byron merged commit d6cd449 into main May 22, 2024
19 checks passed
@Byron Byron deleted the various-fixes branch May 22, 2024 12:40
Copy link
Member

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments about the semantics of joining Windows paths are mainly to identify areas of interest for forthcoming work.

I've posted them as review comments because that seemed like an easier way to identify context than posting a regular comment here or a separate discussion question.

Some potentially identify areas where the tests may be improved, but they do not represent worries about the correctness of the code under test that was introduced in this PR.

I have two main goals:

  • In some cases, there are clarifications that can be made in the custom messages, which I plan to do soon in a small PR.

    Edit: I've opened #1528 and added links to the comments below to corresponding comments in that PR about the related changes.

  • When working on this, I recall that we had discussed the possibility that there was at least one area where the behavior of the standard library should be improved, and you suggested that I might wish to contribute an improvement. I recall being interested in doing so, and also, I think, that you may have been willing to help review or otherwise provide guidance with that. But unfortunately I don't remember what that was!

    It may be possible to figure it out from comments we exported from the temporary private fork in which most of the work that made it into this PR occurred. But those exports, even taken together, were not quite complete, and I have not yet managed to figure out which behavior of the standard library we thought was wrong. I'm hoping you may recall that. These comments include details about behaviors that I am confident at least on re-examination are correct but that may seem wrong, as well as about things that I am not sure are correct.

I also hope these comments may be useful to refer back to later when working on other parts of the code where these concepts are relevant. So if you identify any technical mistakes in the comments themselves, please let me know.

Comment on lines +60 to +62
p("c:").join("relative"),
p("c:relative"),
"drive + relative = strange joined result with missing backslash, but it's a valid path that works just like `c:\relative`"
Copy link
Member

@EliahKagan EliahKagan Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c:relative doesn't work like c:\relative, except when the current directory on the C: drive is the root. Sorry I missed this before. I'll propose a fix to the wording shortly. Note that the asserted behavior here is correct, both in the sense of being what the standard library does and in the sense that it is not a bug for the standard library to do this. It is only the custom message argument that should be changed.

See also #1528 (comment).

Comment on lines +93 to +95
p("d:/").join("C:"),
p("C:"),
"d-drive + c-drive = c-drive - interesting, as C: is supposed to be relative"
Copy link
Member

@EliahKagan EliahKagan Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, because taking a path $P$ and concatenating a path $Q$ should ideally produce a path to the location specified by $Q$ when it is evaluated with $P$ as the current location, to the extent that this can be computed independently of filesystem contents and process environment.

If we are located at d:\ and we use C: as a path, the path C: refers to the current directory on the C: drive. But the only way to express that without resolving paths situationally is as C:.

The unintuitive aspect of this is that one might assume that taking an absolute path $P$ and joining a relative path $Q$ should produce an absolute path. Even if one is accustomed to Windows, it feels like resolving a relative path under an absolute path should be an absolute path. But this is not always what happens, due to the semantics of Windows drive letters.

See also #1528 (comment).

The weirder sub-case of this is when both the absolute path on the left and the relative path on the right have the same drive letter. Different libraries handle this differently.


Paths like C: or C:bar that have a drive letter but no root, and thus are relative to the drive-specific current directory on the drive they specify, are one of two major cases of non-UNC Windows paths that may feel absolute but are not. The other major case is a path that has a root but no drive letter, i.e. it starts with / or \ but not \\, and thus is relative to the current drive, but not relative to the current directory on that drive. (UNC paths are always absolute.)

Both these cases can lead to bugs in code written to assume more Unix-like semantics of the relative/absolute path distinction. The first case, where $Q$ has a drive letter but no root, can be surprising because the concatenation of $P$ and $Q$ can be relative, even when $P$ is absolute. In contrast, the second case, where $Q$ has a root but no drive letter, can be surprising because the concatenation of $P$ and $Q$ may be outside of $P$, even when $P$ is absolute and $Q$ is a relative path containing no .. component.

The other factor that makes this hard, at least for me, is that the terminology in this part of the official documentation is extremely strange, for example by characterizing paths like \file.txt as both relative and absolute, and by seeming to imply that a path with any .. component anywhere is relative regardless of how it starts. I think the more commonly used terminology is what this other article uses, even though that is (a Windows-specific) part of the .NET documentation, rather than being part of the Windows documentation. That is the terminology I have used here.

Comment on lines +114 to +116
p("\\\\.").join("C:"),
p("C:"),
"device-namespace-unc + win-drive-relative = win-drive-relative - c: was supposed to be relative, but it's not acting like it."
Copy link
Member

@EliahKagan EliahKagan Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-namespace-unc part of the custom message suggests that this may have been meant to start with p("\\\\.\\") rather than p("\\\\."). As currently written, I am not sure what the ideal behavior would be here, because I am not sure the trailing \ of the prefix \\?\ is strictly speaking functioning as a path separator. The Rust standard library does sometimes treat \\.\ and \\. differently as the left side of a path join, though both for the relative path C: and the absolute path C:\ on the right, the result is the same.

However, for the same reason as in the above comment, that C: is relative should not be a reason to expect anything other than this behavior. It's less clear what it means to be located at \\. or \\.\. But I don't think it should be an exception to the usual effect of C: on the right side.

Furthermore, although \\.\C: and \\.\C:\ are both valid paths, we cannot correctly form either one of them by concatenating \\. or \\.\ with C:, and the reason is because a path C: is a relative path:

  • \\.\C: is in the device namespace and ends in C: with no path separator, so that C: is a device name. Therefore \\.\C: refers to the C: drive as a device; the C: in \\.\C: is not a reference to the current directory, root directory, or any file or directory on the C: drive, but instead to the C: drive itself. (\\.\C: is an absolute path, but not to a file or directory.)
  • \\.\C:\, due to the trailing \, is the root directory on the C: drive, and thus generally equivalent to \\?\C:\ (though I don't know that all APIs that accept one will accept the other or that all APIs that accept both will necessarily treat them identically). But this is an absolute path to the root of the C: drive, which C: by itself does not designate because it is a relative path whose meaning depends on the current directory on the C: drive.

See also #1528 (comment).

Comment on lines +120 to +123
assert_eq!(
p("/").join("\\\\localhost"),
p("\\localhost"),
"unix-absolute + win-absolute-unc = win-absolute-unc"
Copy link
Member

@EliahKagan EliahKagan Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not raw strings, so this is indicating that / joined with \\localhost gives \localhost with one leading \, and I don't understand why this is what happens. It also does not appear to be what this meant to assert, since it characterizes the effect of the join as = win-absolute-unc, but a path that begins with only one \ is not a UNC path. So I don't know what's going on here.

See also #1528 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a mistake, as it indeed doesn't consider that "\…" is not a raw string.
I don't know what an alternative comment would be, but it definitely isn't = win-absolute-unc.

@Byron
Copy link
Member Author

Byron commented Aug 19, 2024

  • I recall being interested in doing so, and also, I think, that you may have been willing to help review or otherwise provide guidance with that. But unfortunately I don't remember what that was!

I also have faint memory of that, but not the specific thing we thought could be fixed. Hopefully that will come back once these cases are improved and/or fixed up.

@RivenSkaye
Copy link

Considering recent changes in std in the same domain, could you have been considering implementing/adding something along the lines of std::path::absolute perhaps? Came across that when Av1an had a clash with stuff in other langs and proposed that as a fix for the issues with extended path length syntax caused by not all Windows APIs supporting the syntax

@EliahKagan
Copy link
Member

Considering recent changes in std in the same domain, could you have been considering implementing/adding something along the lines of std::path::absolute perhaps?

Although that makes sense and would have been a good idea, I don't think so. If I remember correctly, I think some behavior of the standard library shown or commented on in those tests had seemed like it was wrong, at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants