-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contenthash: implement proper Linux symlink semantics #4896
Conversation
13c27ce
to
98280ea
Compare
For reference, here is a Dockerfile which reliably triggered this issue:
|
377fd17
to
95995f5
Compare
The |
7f3d018
to
4692cf0
Compare
Ref: moby/buildkit#4896 SUSE-Bugs: https://bugzilla.suse.com/show_bug.cgi?id=1221916 Signed-off-by: Aleksa Sarai <[email protected]>
Ref: moby/buildkit#4896 SUSE-Bugs: https://bugzilla.suse.com/show_bug.cgi?id=1221916 Signed-off-by: Aleksa Sarai <[email protected]>
Ref: moby/buildkit#4896 SUSE-Bugs: https://bugzilla.suse.com/show_bug.cgi?id=1221916 Signed-off-by: Aleksa Sarai <[email protected]>
Ref: moby/buildkit#4896 SUSE-Bugs: https://bugzilla.suse.com/show_bug.cgi?id=1221916 Signed-off-by: Aleksa Sarai <[email protected]>
Ref: moby/buildkit#4896 SUSE-Bugs: https://bugzilla.suse.com/show_bug.cgi?id=1221916 Signed-off-by: Aleksa Sarai <[email protected]>
4692cf0
to
93980ea
Compare
Maybe I should get a dummy PR merged so you don't need to re-trigger the CI 😅. |
93980ea
to
8ce3186
Compare
Not gotten a chance to dive into this yet, but just a quick comment - I wonder if we need any changes in https:/tonistiigi/fsutil as well? There's some logic there for following symlinks as well, and other path resolution-y things. |
Yeah, I looked at However, it wasn't clear to me how it's used by buildkit/docker/containerd. I can write some patches, but if they have subtle semantics (like (It's a little frustrating that I spent so much time getting |
8ce3186
to
f56db22
Compare
@tonistiigi wdyt? Also, did you want me to work on a PR for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this sgtm, some comments.
Also, did you want me to work on a PR for fsutil?
If there is an issue there then sure. There isn't similar "virtual-fs" logic there like in the contenthash. I guess maybe the initial resolving of the input path containing symlink could have something similar to this. We can start by creating an issue ticket if we can locate a bug.
// the root directory. This is a slightly modified version of SecureJoin from | ||
// github.com/cyphar/filepath-securejoin, with a callback which we call after | ||
// each symlink resolution. | ||
func rootPath(root, unsafePath string, followTrailing bool, cb onSymlinkFunc) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test with the new followTrailing
parameter disabled and it did not fail any checks. Not a blocker for the PR, but could be something to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for having followTrailing
here is so that if you are instructed to scan /some/path/foo
where foo
is a symlink and you have opts.FollowLinks=false
, scanPath
will scan /some/path
rather than the parent directory of wherever foo
is pointing to. This was done to match the previous behaviour of scanPath
.
As for why the tests don't fail with followTrailing=false
, that is a little strange. If you are asked to checksum a path with a trailing symlink and opts.FollowLinks=true
, I would expect you to get a "not found" error because we don't scan the symlink target if followTrailing=false
. I guess the tests we have don't test this case properly... The need for followTrailing
in general is to match the old handling of opts.FollowLinks
.
f56db22
to
bfccd5a
Compare
857b135
to
3670362
Compare
Windows CI failure in new test
|
960520f
to
88f9fe8
Compare
Okay, it seems that Windows's |
This is based on the github.com/cyphar/filepath-securejoin implementation. Since we need to resolve all components when doing lookups, doing it recursively serves little purpose (other than complicating the implementation). There was also a bug in how symlinks containing "/../" in the target name were handled, though this is a more specific issue than the general issues with ".." that the rest of the contenthash code has. Ref: 9bf5431 ("contenthash: fix issues on symlinks on parent paths") Signed-off-by: Aleksa Sarai <[email protected]>
This patch is part of a series which fixes the symlink resolution semantics within BuildKit. You cannot implement symlink resolution on Linux naively using path.Join. A correct implementation requires tracking the current path and applying each new component individually. This implementation is loosely based on github.com/cyphar/filepath-securejoin. Things to note: * The previous implementation of getFollowLinks actually only resolved symlinks in parent components of the path, leading to some callers to have to implement resolution manually (and incorrectly) by calling getFollowLinks several times. In addition to being incorrect and somewhat difficult to follow, it also lead to the ELOOP limit being much higher than 255 (while some callers used getFollowLinksWalk, most used getFollowLinks which reset the limit for each iteration). So, add getFollowParentLinks to allow for callers to decide which behaviour they need. getFollowLinks now follows all links (correctly). * The trailing-slash-is-significant behaviour in the cache (dir vs dir header) needs to be handled specially because on Linux there is no distinction between "a/" and "a" (assuming a is not a symlink, that is) and so filepath-securejoin's implementation didn't care about trailing slashes. The previous implementation hid the trailing path behaviour purely in the splitKey() implementation, making the need for this quite subtle. * The previous implementation was recursive, which in theory would allow you to find some paths slightly more quickly (if you find a valid ancestor you don't need to check above it) at the cost of making some lookups more expensive (a path with an invalid ancestor very early on in the path). However, implementing the correct lookup algorithm recursively proved to be quite difficult. It is possible to implement a similar optimisation (try to find the first non-symlink parent component and iterate from there), this complicates the implementation a fair amount and it doesn't seem clear that the performance tradeoff is a benefit in general. Ultimately, cache lookups are quite fast and so there probably isn't a practical performance difference between approaches. Signed-off-by: Aleksa Sarai <[email protected]>
This patch is part of a series which fixes the symlink resolution semantics within BuildKit. Since we now have a working implementation of symlink resolution in getFollowLinks, we can add a callback after each component is resolved so that we can track which components were and were not found during the resolution. Because getFollowLinks resolves the components in-order, we can use this to tell needsScan whether the resolver found a real (non-symlink) ancestor of the requested path in the cache. Signed-off-by: Aleksa Sarai <[email protected]>
This patch is part of a series which fixes the symlink resolution semantics within BuildKit. Previously, the concept of the follow flag had different meanings in various parts of the checksum codepath. FollowLinks is effectively O_NOFOLLOW, but the implementation in getFollowLinks was actually more like RESOLVE_NO_SYMLINKS. This was masked by the fact that checksumFollow would implement the O_NOFOLLOW behaviour (incorrectly), but checksumFollow would call checksumNoFollow (which would follow symlinks in path components by setting follow=true for getFollowLinks). It is much easier to simply remove these layers of indirection and unify the meaning of FollowLinks across all of the code. This means that the old follow flag is no longer needed. This also means that we can now remove the incorrect symlink resolution logic in (*cacheContext).checksumFollow() and move the followTrailing logic to (*cacheContext).checksum(), as well as removing getFollowParentLinks(). Since this removes some redundant re-checksum loops, we need to add followTrailing logic to scanPath() so that final symlink components result in the correct directory being scanned properly. The only user of (*cacheContext).checksum(follow=false) was (*cacheContext).includedPaths() which appeared to be simply using this as an optimisation (since the path being walked already had its parent path resolved). Having two easily-confused boolean flags for an optimisation that is probably not necessary (getFollowLinks already does a fast check to see if the original path is in the cache) seemed unnecessary, so just keep followTrailing. Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
88f9fe8
to
ee43730
Compare
This series fixes the symlink following implementation of
cache/contenthash
. The primary issue with the previous implementation is that it assumed thatpath.Join
is a reasonable way of implementing trailing symlink following -- this is not correct on Linux.path.Clean
is a Plan9-ism, and while it is useful, on Linux symlinks are resolved left-to-right and thus lexically transforminga/b/../c
toa/c
is incorrect ifb
is a symlink. The paper describing Plan9'spath_clean
makes reference to this Unix-ism they fixed by removing symlinks from Plan9.As the trailing symlink logic was written in quite a few places, all of the implementations needed to be fixed. The headline changes are:
follow
flag now meansfollowTrailing
(O_NOFOLLOW
) everywhere. Previously,getFollowLinks
andcc.checksum
used it to mean "don't follow any links" (a-laRESOLVE_NO_SYMLINKS
) whilecc.Checksum
andcc.checksumFollow
used it to meanO_FOLLOW
. There was only one user of theRESOLVE_NO_SYMLINKS
implementation (cc.includedPaths
) and it appears it was only used as an optimisation (when in fact, since we are iterating over the keys in the cache, the key definitely exists,cc.checksum
will just do a simple lookup regardless ofRESOLVE_NO_SYMLINKS
).cc.checksumFollow
is gone (now thatchecksum
supportsfollowTrailing
directly), andcc.checksumNoFollow
andcc.lazyChecksum
have been renamed and reworked slightly. Removing the loop inchecksumFollow
also means we will detect symlink loops 255 times faster 😸, not to mention that we now don't do a bunch of unnecessary scanning and checksumming when dealing with trailing symlinks.needsScan
now just usesgetFollowLinksCallback
to track the status of a lookup to see if there is a non-symlink path component that has already been scanned in the cache. This removes one extra implementation of symlink lookups.scanPath
now does a scan ofpath.Dir
of the resolved path, rather than resolving the lexical directory of the requested path (this meansscanPath
,needsScan
, and thusrootPath
now also takefollowTrailing
). I suspect that always fully-resolving the path would be okay (because we fill in symlinks during resolution) but in the case offollowTrailing=false
it seems unclear to me which directory "should" be scanned. The old implementation would scan all of them (becausechecksumFollow
would re-apply the trailing link and re-checksum everything).Signed-off-by: Aleksa Sarai [email protected]