Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolver: Isolate auth token cache per session #3592

Merged
merged 2 commits into from
Feb 9, 2024

Conversation

marxarelli
Copy link
Contributor

Prior to this change, entries in the resolver's auth token cache was keyed only by remote name and action (push/pull). This resulted in remote authenticated sessions being leaked between distinct client sessions with the potential for one client's token to be used in authentication for another client's registry access.

While this may not have had a substantial impact for pull requests as the solver vertices are shared for all clients as is the local cache namespace—i.e. one client can always receive a cache entry that was the result of another client's cached pull request—the shared auth cache also allowed one client to push to a remote registry for a ref it may not have otherwise had rights to. This behavior is particularly problematic in shared CI environments where auth token scopes are used to limit push access by registry namespace.

Key each entry in the auth token cache by <session id>:<ref>:<action> to avoid cross use between distinct client sessions.

Signed-off-by: Dan Duvall [email protected]

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this makes much sense as the LLB vertexes from different requests are merged anyway and even if it was not then all requests have access to build cache without any session auth check. So this fixes a small window (possibly by making things slower) while in many other order of requests the access to the resources would remain as it was before.

If you want to serve multiple untrusted clients then we would need a "namespace" construct for the whole builder so all aspects of the build are isolated. Even sharing content addressable blobs is actually tricky because although they can be shared safely, how would you prove that the second client actually had access to the full data(not just digest) without actually pulling it for a second time.

@marxarelli
Copy link
Contributor Author

marxarelli commented Feb 8, 2023

I see three distinct behaviors in the way the auth token cache is implemented now given a two-client scenario, one of which is not really of concern and two that are concerning. Of the two concerning behaviors, one is a functional (non-deterministic race condition) concern and the other is a security concern.

Scenario: Client A and Client B request a solve that share a vertex (same cache key) for a remote resource that requires remote auth and push the output to a remote registry that requires auth. Client A has valid credentials to pull the remote resource of the shared vertex and to push to the remote registry. Client B has no credentials.

Behaviors:

  1. Given Client A completes its solve first, Client B has access to the cached vertex, either by direct retrieval via the content client or indirectly by integrating the vertex into its result and exporting the result. For reasons you describe, this is not of concern. All clients share the same namespace and cache space.
  2. Given Client B requests its solve first, the request for the remote resource of the shared vertex fails, because Client B has no credentials with which to retrieve it. The failed auth result is cached. Client A then requests its solve and the solve fails because Client B's failed auth request is used with Client A. This is a functional concern, a race condition and non-deterministic behavior depending on timing of two unrelated clients.
  3. Given Client A completes its solve and exports its result first, Client B can piggyback on Client A's cached auth to write its result to a remote registry for which it has no credential. This is a security exploit and not one effecting the clients or buildkitd itself but effecting the remote registry, quite likely a third party system.

IMHO, it is bad practice in general to ever cache auth credentials or results across sessions of unrelated clients, because even if the clients are considered trusted in the primary domain, there may be operations with remote systems for which these same clients would not be considered trusted.

@tonistiigi
Copy link
Member

Given Client B requests its solve first, the request for the remote resource of the shared vertex fails, because Client B has no credentials with which to retrieve it. The failed auth result is cached. Client A then requests its solve and the solve fails because Client B's failed auth request is used with Client A. This is a functional concern, a race condition and non-deterministic behavior depending on timing of two unrelated clients.

I'm not sure this is what actually happens. The behavior should be to try all the sessions associated with the pull an look for one that works. But maybe the error codes have some effect.

Given Client A completes its solve and exports its result first, Client B can piggyback on Client A's cached auth to write its result to a remote registry for which it has no credential. This is a security exploit and not one effecting the clients or buildkitd itself but effecting the remote registry, quite likely a third party system.

Ok, but in that case you only need it if auth is asked for the push scope?

@marxarelli
Copy link
Contributor Author

I'm not sure this is what actually happens. The behavior should be to try all the sessions associated with the pull an look for one that works. But maybe the error codes have some effect.

You're right. I thought I had observed this behavior a while back, but I'm unable to reproduce it now. Perhaps our registry was returning something unexpected due to our JWT auth setup. In any case, this isn't a problem.

Given Client A completes its solve and exports its result first, Client B can piggyback on Client A's cached auth to write its result to a remote registry for which it has no credential. This is a security exploit and not one effecting the clients or buildkitd itself but effecting the remote registry, quite likely a third party system.

Ok, but in that case you only need it if auth is asked for the push scope?

That's true. I'm happy to amend the change and limit this to push scopes if you're open to that.

@tonistiigi
Copy link
Member

That's true. I'm happy to amend the change and limit this to push scopes if you're open to that.

SGTM

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@marxarelli
Copy link
Contributor Author

My sincere apologies for disappearing. I will refactor this today.

@marxarelli
Copy link
Contributor Author

@tonistiigi not sure if you saw my request for re-review. I've (finally) made the changes you requested.

@marxarelli
Copy link
Contributor Author

@tonistiigi is this still an acceptable change?

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've missed notifications about this. Yes, nothing against the concept.

// party registries from leaking between client sessions. The key will end
// up looking something like:
// 'wujskoey891qc5cv1edd3yj3p::repository:foo/bar::pull,push'
key = fmt.Sprintf("%s::%s::%s", strings.Join(session.AllSessionIDs(g), ":"), name, scope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is just random, or something unique that is passed in from each call path that does push? session.AllSessionIDs(g) looks weird but I guess it has the correct behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I know enough about the codebase to say whether a random value would have the same exact effect. I would think use of the session ID would at least benefit sessions where multiple pushes are made for the same repositories. (e.g. cache exports?)

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase as well

Prior to this change, entries in the resolver's auth token cache was
keyed only by remote name and action (push/pull). This resulted in
remote authenticated sessions being leaked between distinct client
sessions with the potential for one client's token to be used in
authentication for another client's registry access.

While this may not have had a substantial impact for pull requests as
the solver vertices are shared for all clients as is the local cache
namespace—i.e. one client can always receive a cache entry that was the
result of another client's cached pull request—the shared auth cache
also allowed one client to push to a remote registry for a ref it may
not have otherwise had rights to. This behavior is particularly
problematic in shared CI environments where auth token scopes are used
to limit push access by registry namespace.

Key each entry in the auth token cache by `<session id>:<ref>:<action>`
to avoid cross use between distinct client sessions.

Signed-off-by: Dan Duvall <[email protected]>
@marxarelli
Copy link
Contributor Author

Rebased, @tonistiigi

@marxarelli
Copy link
Contributor Author

❤️ Thanks, @tonistiigi

@tonistiigi tonistiigi merged commit 092bec8 into moby:master Feb 9, 2024
63 checks passed
@marxarelli marxarelli deleted the review/isolate-token-cache branch February 21, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants