Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
contenthash: implement proper Linux symlink semantics for getFollowLinks
This patch is part of a series which fixes the symlink resolution semantics within BuildKit. You cannot implement symlink resolution on Linux naively using path.Join. A correct implementation requires tracking the current path and applying each new component individually. This implementation is loosely based on github.com/cyphar/filepath-securejoin. Things to note: * The previous implementation of getFollowLinks actually only resolved symlinks in parent components of the path, leading to some callers to have to implement resolution manually (and incorrectly) by calling getFollowLinks several times. In addition to being incorrect and somewhat difficult to follow, it also lead to the ELOOP limit being much higher than 255 (while some callers used getFollowLinksWalk, most used getFollowLinks which reset the limit for each iteration). So, add getFollowParentLinks to allow for callers to decide which behaviour they need. getFollowLinks now follows all links (correctly). * The trailing-slash-is-significant behaviour in the cache (dir vs dir header) needs to be handled specially because on Linux there is no distinction between "a/" and "a" (assuming a is not a symlink, that is) and so filepath-securejoin's implementation didn't care about trailing slashes. The previous implementation hid the trailing path behaviour purely in the splitKey() implementation, making the need for this quite subtle. * The previous implementation was recursive, which in theory would allow you to find some paths slightly more quickly (if you find a valid ancestor you don't need to check above it) at the cost of making some lookups more expensive (a path with an invalid ancestor very early on in the path). However, implementing the correct lookup algorithm recursively proved to be quite difficult. It is possible to implement a similar optimisation (try to find the first non-symlink parent component and iterate from there), this complicates the implementation a fair amount and it doesn't seem clear that the performance tradeoff is a benefit in general. Ultimately, cache lookups are quite fast and so there probably isn't a practical performance difference between approaches. Signed-off-by: Aleksa Sarai <[email protected]>
- Loading branch information