-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dockerfile2llb: filter unused paths for named contexts #4161
Conversation
Want some early feedback on this, but leaving it as a draft as it presently only works when |
6636a86
to
4d118fa
Compare
if err == nil { | ||
if len(cmd.sources) == 0 { | ||
for _, src := range c.SourcePaths { | ||
d.ctxPaths[path.Join("/", filepath.ToSlash(src))] = struct{}{} | ||
} | ||
} else if source := cmd.sources[0]; source.namedCtxState != nil { | ||
for _, src := range c.SourcePaths { | ||
source.namedCtxState.includePath(src) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic would need to be replicated to the instructions.AddCommand
branch as well.
Also, ideally we could somehow just have a single combined struct that wraps both the client main context, and a named context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be replicated to ADD
. ADD
doesn't seem to support the --from
option. The CopyCommand
has a From
option while AddCommand
doesn't seem to have that.
@@ -135,26 +135,86 @@ func ListTargets(ctx context.Context, dt []byte) (*targets.List, error) { | |||
return l, nil | |||
} | |||
|
|||
type namedContextState struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like filteredLocalState
or similar?
I'm not sure about including an Image
field in here, I get the logic, but it's not really connected with what paths get pulled from the client, it shouldn't affect the contents of the image data. If it gets split out, it feels like a separate refactor IMO, so if there's really a good reason to do this, it makes sense as a separate commit.
copt := nc.copt | ||
if len(nc.includePaths) > 0 { | ||
if includePatterns := normalizeContextPaths(nc.includePaths); includePatterns != nil { | ||
copt.LocalOpts = append(copt.LocalOpts, llb.FollowPaths(includePatterns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copt.LocalOpts
will get modified on the original here, which I imagine isn't the intent (given the above comment)?
copt.LocalOpts = append(copt.LocalOpts, llb.FollowPaths(includePatterns)) | ||
} | ||
} | ||
st, img, err := client.NamedContext(ctx, nc.name, copt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to avoid multiple calls to NamedContext
. Some branches in NamedContext
do config resolving etc.
What if dockerui/client.NamedContext()
returns a struct that has State/Image
properties and SetLocalOpts([]llb.LocalOption)
method. CurrentContextOpt.LocalOpts
seems to be unused and can be replaced with the new method.
Calling SetLocalOpts
will change the definition of State
, either because the internal llb.Local
is under mutableOutput
or inside llb.Async()
. This is only for the llb.Local
definition though and doesn't rerun any other branches of NamedContext()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, would it make sense to also return that same struct from the MainContext
method as well? That way we can use the same way of updating the opts later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference is that MainContext
doesn't ever return an Image
(at least atm). Otherwise, I think if it is easy and unlikely to break something, we can do it now. Otherwise, we can do the namedcontext for now and look at the case for maincontext in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I was thinking about was what if we just returned it as part of the output similar to mutableOutput
? Something like:
type localOutput struct {
name string
opts []llb.LocalOpts
}
Then have it create the vertex in an async manner. It should then be possible to inspect the output for the mounted output and update the local options.
I think this could be coded similar to Async
or possibly coded the way async is coded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand where/how that localOutput
would be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Borrowing heavily from async:
type LocalContext struct {
name string
opts []llb.LocalOption
}
func (l *LocalContext) ToInput(ctx context.Context, c *llb.Constraints) (*pb.Input, error) {
out := l.Do(ctx, c)
return out.ToInput(ctx, c)
}
func (l *LocalContext) Vertex(ctx context.Context, c *llb.Constraints) llb.Vertex {
out := l.Do(ctx, c)
return out.Vertex(ctx, c)
}
func (l *LocalContext) Do(ctx context.Context, c *llb.Constraints) llb.Output {
st := llb.Local(l.name, l.opts...)
return st.Output()
}
func main() {
st1 := llb.NewState(&LocalContext{
name: "context",
})
st2 := st1.Dir("/src")
// We can modify the local options by inspecting the state output.
if lc, ok := st2.Output().(*LocalContext); ok {
lc.opts = append(lc.opts, llb.WithCustomName("internal"))
}
def, err := st2.Marshal(context.TODO())
if err != nil {
panic(err)
}
_ = llb.WriteTo(def, os.Stdout)
}
The intention is that the areas where we have to modify the local context, such as this one:
if err == nil && len(cmd.sources) == 0 {
for _, src := range c.SourcePaths {
d.ctxPaths[path.Join("/", filepath.ToSlash(src))] = struct{}{}
}
}
This could be modified to:
if err == nil {
if len(cmd.sources) == 0 {
for _, src := range c.SourcePaths {
d.ctxPaths[path.Join("/", filepath.ToSlash(src))] = struct{}{}
}
} else if lc, ok := cmd.sources[0].state.Output().(*LocalContext); ok {
// see above
}
}
The benefit of this is we could also end up consolidating how the local options are handled so the default context and named contexts aren't different from each other.
I'll play around with the code though. I do have concerns about this being a bit too complicated to follow. Modifying the client to return a struct may end up just being easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decently certain it's possible to do this using Async
as long as I can modify the values the client returns.
At the moment, the only way to use async for a source is to do llb.Scratch().Async(...)
and ignore the input. Are we potentially interested in adding an llb.Async
top-level function? I could use it and can add it as part of this change if that's helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llb.Scratch().Async(...)
and llb.Async
should be equivalent. The previous state (scratch in your case) is just what gets sent to the callback as the previous state value. The callback can just ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea they're equivalent. I was wondering if we wanted to formally add an llb.Async
that is a shortcut for the above.
ec3bba0
to
09c652b
Compare
I've updated this PR with the feedback. This now uses |
5548458
to
6f39032
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(follow-up) Maybe after tonistiigi/fsutil#167 we can write an integration test for this as well.
frontend/dockerui/namedcontext.go
Outdated
) | ||
return &st, nil, nil | ||
|
||
// This is a weird invocation and requires an explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip this first sentence.
The comment should more describe the behavior. Eg. "It is possible to modify the filter applied to local files later via SetLocalOpts method so the Local is wrapped in Async that only gets called during marshaling".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this entire area can be updated to something else, I don't think this comment would be helpful. The sentence was meant to explain why new([]llb.LocalOption)
but I'm going to remove that anyway.
@@ -778,6 +807,7 @@ type dispatchState struct { | |||
buildArgs []instructions.KeyValuePairOptional | |||
commands []command | |||
ctxPaths map[string]struct{} | |||
namedCtxState *namedContextState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid the namedContextState
type. I would define includePaths map[string]struct{}
in here and explicitly call SetLocalOpts
like you had in the previous version to make it clear what is the order of gathering paths and marshaling. If you need a helper function then that is fine but I think normalizeContextPaths
already does the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember why I needed to add this. The reason is I needed a place to store the returned NamedContext
so I could call SetLocalOpts
. If I only have includePaths
in the dispatch state, it isn't able to use that information to invoke SetLocalOpts
on the original NamedContext
.
I could make this part of NamedContext
instead of creating a new type. But, I'd have to add a method like:
func (nc *NamedContext) FollowPath(src string)
SetLocalOpts
would end up being too low level. I'd probably need to then update the API if/when tonistiigi/fsutil#167 gets merged.
Since I'd need to store NamedContext
in the dispatch state somewhere, I'd need to add two fields instead of one and it would be non-obvious that the purpose of includePaths
was only for feeding into NamedContext
and has no other purpose.
frontend/dockerui/namedcontext.go
Outdated
// read from the anonymous function passed to async. In order to ensure we're | ||
// referencing the same slice, we allocate a pointer to a slice rather than just | ||
// the slice which would be passed by value. | ||
customOpts := new([]llb.LocalOption) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a pointer. You can just do:
var nc NamedContext
st = llb.Async(func() { if nc.localOpts {} })
nc.State = st
return &nc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally this was going to be a slice that was modified. I think that doesn't really make sense anymore so I'll switch this up. It may be good to make it a bit more explicit though that the pointer is the one being captured. I think go should figure it out automatically, but just in case it would probably be better to do nc := &NamedContext{}
.
Need to do some testing but responded to the existing comments. |
Did some testing and I think this is likely good to go. There's on case that I'm concerned about doing incorrectly, which is when the named context is the subject of Before merging this, it might be beneficial to write an integration test for this that ensures the data transfer is the same between the build context and a named context. I think #4165 could be beneficial for this since it will be easier to find the number of bytes transferred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a Dockerfile like
from scratch
copy foo .
copy --from=base bar .
buildctl build --frontend dockerfile.v0 --local dockerfile=. --local context=. --local base=. --opt context:base=local:base
this currently does not work.
/tmp/a # ls -lh
total 102M
-rw-r--r-- 1 root root 47 Aug 25 05:22 Dockerfile
-rw-r--r-- 1 root root 4 Aug 25 05:21 bar
-rw-r--r-- 1 root root 3 Aug 25 05:21 foo
-rw-r--r-- 1 root root 102.0M Aug 25 05:21 rand
=> [context base] load from client
=> => transferring base: 106.95MB
If I add FROM alpine AS base
as first line that seems to correct it.
Make sure to take this out of draft as well if it is ready.
For the integration test I think tonistiigi/fsutil#167 would be better than trying to read the progress. That would allow defining context as virtual fs where the test can then see what exact files were read from it. Even if we read progress then in the test we can just access the client
progress types directly and don't need to convert to json and back.
continue | ||
} | ||
|
||
if includePatterns := normalizeContextPaths(d.usedPaths); includePatterns != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not directly for this PR but looks like the includePatterns != nil
check in 571 might not be completely correct and should be len(includePatterns) != 0
. In here it doesn't matter because there is a separate len
check but it could be avoided if we just do the same len
check in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that. I'll go through and fix that. The way that Go handles empty slice and nil slices as different things isn't something I'm the biggest fan of so I usually default to using len()
as the way of checking slices in general. I wanted to copy the existing code though for this situation. I'll change both to use length checking.
@@ -217,6 +213,10 @@ func toDispatchState(ctx context.Context, dt []byte, opt ConvertOpt) (*dispatchS | |||
|
|||
allDispatchStates := newDispatchStates() | |||
|
|||
// Store the mapping between a dispatchState and named contexts so we can set the local | |||
// options later. | |||
namedContexts := make(map[*dispatchState]*dockerui.NamedContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could just define namedContext *dockerui.NamedContext
in *dispatchState
and avoid this extra map.
d.platform = platform | ||
// Do not store the named context when it is being used as a base image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if d.base
is a namedContext
already then the filter should also be disabled for it.
One way to do it if you also remove the namedContexts
map would be to initialize usedPaths
map on setting dispatchState.namedContext
and then in here if you detect that it is used as base image then set usedPaths = nil
what would mean that filtering is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely certain if this comment is even correct or relevant. I need to check on that today. When I was attempting to test it, I ran into a panic in the scenario that I was concerned about.
Another way of doing this might be to add "/"
as a used path when I detect it's being used as a base image. Setting it to nil works, but I feel like it's too much state to keep in your head and too many null pointers that could result in a panic. If we add "/"
as a used path, then normalizeContextPaths
will return nil and the local option won't be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thing I found to work is disabling the context filtering when the associated dispatch state has commands associated with it. I figure this works well because even if an external stage is using only path /a.txt
, the stage itself is using the entire base image so I just have commands disabling the filter for named context in general.
When COPY --from
is used, the implicit stages that are created don't have any commands in them so it's safe to filter them.
b9c1126
to
c51b76e
Compare
ec3a120
to
84e1591
Compare
PTAL CI errors |
// Propagate the set of paths from this dispatchState to the base | ||
// dispatchState to enable filtering local contexts based on the set | ||
// of used paths. | ||
ds.paths = mergePaths(ds.paths, d.paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it a problem that this runs before dispatchRun
.
I'm not sure how a path added in the dispatchRun of a child stage would affect the dispatchState of the parent that has namedContext
set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should probably change the name of this. I didn't think of a better name yesterday.
The intention of this method is to ensure that ds.paths
and d.paths
refer to the same location in memory. At this point, ds.paths
is probably empty but it's just making sure that everything is present in case it wasn't. This happens before dispatchRun
because it ensures that all of the maps relate to the same map.
I think unifyPaths
is probably a better name now that I had a night to sleep on it.
78d6031
to
286f5b9
Compare
6b08069
to
250f926
Compare
Added a test. I think this is good for review now. I would agree that the test for this would be easier if one of two things was true.
I saw @jedevc had a PR to get that wired up so I can return to this once that's merged. I might also try to experiment with testing underlying LLB structures. |
250f926
to
4b07f9b
Compare
@@ -809,12 +841,24 @@ func (dss *dispatchStates) addState(ds *dispatchState) { | |||
if d, ok := dss.statesByName[ds.stage.BaseName]; ok { | |||
ds.base = d | |||
ds.outline = d.outline.clone() | |||
// Unify the set of paths so both this state and the parent state | |||
// refer to the same map when recording path usage. | |||
unifyPaths(&ds.paths, d.paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed that just set a variable in here if the extra hacks are not really needed.
@@ -876,6 +920,9 @@ func dispatchRun(d *dispatchState, c *instructions.RunCommand, proxy *llb.ProxyE | |||
|
|||
customname := c.String() | |||
|
|||
// Run command can potentially access any file. Mark the full filesystem as used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any command that changes LLB should need this, not just RUN
. Also, if the stage is part of the target then it needs all the files.
FROM foo AS image_source | ||
COPY --from=alpine / / | ||
RUN cat /foo > /bar | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a case that also checks the alias case so that the unifyPaths
gets called.
4b07f9b
to
74063a1
Compare
This changes the llb generation in `dockerfile2llb` to add the `llb.FollowPaths` option when named contexts are used in the same way that happens with the default context. This means that dockerfiles that look like this: ``` FROM scratch COPY --from=mynamedctx ./a.txt /a.txt ``` Will only load the file `a.txt`. This behavior is consistent with the default context. This also removes the unused []llb.LocalOption from ContextOpt and replaces it with an async counterpart. This is needed because we initialize the named context before we know which paths are needed for the filter. Signed-off-by: Jonathan A. Sternberg <[email protected]>
Updated this PR and it should now pass the tests. Some changes from the original version:
|
74063a1
to
bee54f0
Compare
This changes the llb generation in
dockerfile2llb
to add thellb.FollowPaths
option when named contexts are used in the same waythat happens with the default context.
This means that dockerfiles that look like this:
Will only load the file
a.txt
. This behavior is consistent with thedefault context.
This also removes the unused []llb.LocalOption from ContextOpt and
replaces it with an async counterpart. This is needed because we
initialize the named context before we know which paths are needed for
the filter.
Fixes docker/buildx#1765.