Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need better authorization for OxQL queries #5298

Open
bnaecker opened this issue Mar 21, 2024 · 3 comments
Open

Need better authorization for OxQL queries #5298

bnaecker opened this issue Mar 21, 2024 · 3 comments
Assignees
Labels
api Related to the API. Metrics

Comments

@bnaecker
Copy link
Collaborator

A prototype query language called OxQL is implemented in #5273. That PR implements the most rudimentary of authorization checks, requiring read permissions on the fleet to call its new endpoints. This issue tracks adding more robust and sensible authorization checks instead. There are a lot of pieces to this, and here a few notes.

First, Nexus may want to authorize access to a timeseries as a whole -- for example, reading physical temperature sensors may require elevated permissions. Nexus has access to the timeseries available, but would need additional metadata about the required level of permissions for each of them. That could be in static data, or part of a layer of indirection between the public timeseries and those stored in ClickHouse itself. There are other reasons to want such a layer, such as enhancing stability, and also legitimate arguments for punting (urgency around diagnosing customer issues).

Another piece of this is restricting the data within a timeseries that's visible to a customer. For example, users should not be able to see vCPU usage data for instances that they cannot otherwise access. Nexus could inspect the filters supplied in the queries, such as on things like project_id, and ensure the caller can read them. It may also want to inject its own filters into the query, to ensure that we never even read data out of ClickHouse that the user isn't authorized for.

These are just a few ideas, and there are likely many more. We'll want to flesh this issue out, and / or start an RFD, as we delve into it.

@bnaecker bnaecker added api Related to the API. Metrics labels Mar 21, 2024
@bnaecker bnaecker changed the title Need a way to authorize OxQL queries Need better authorization for OxQL queries Mar 21, 2024
@david-crespo
Copy link
Contributor

david-crespo commented May 13, 2024

We talked about this a bit more in chat. We went over the query inspection and filter injection ideas, and @bnaecker suggested a middle ground between fully dynamic queries and on-off endpoints that seems quite practical: a silo-scoped endpoint that provides a predefined set of queries that developer users are allowed to make, and which implements custom authorization logic for each. For example, the request body type could be an enum (union) of types that look like this.

{ 
  "metric_name": "instance_cpu",
  "params": { 
    "project": "my-proj",
    "instance": "my-inst"
  } 
}

When a request comes in, we know it is a valid metric because it parses as a request body at all, and then would know for that metric that we need to authorize access to the instance by a) checking that the instance is in that project, and b) authorizing the user's access to that project.

It would use OxQL under the hood and therefore be easy to implement, and it would be easy to add new queries without cluttering up the API with a ton of endpoints, but it would save us from having to figure out how to solve authz in a general way, and it would let us learn a lot about the problem before we try to do that.

@bnaecker
Copy link
Collaborator Author

I talked last week with @davepacheco about the general strategy for timeseries authz. I wanted to capture some notes here for the record, though eventually this should make its way to an RFD. We both agree that it's hard, at least to provide a view of the timeseries data that's completely consistent with the other API objects. Specifically, the goal allow a user to read timeseries data that pertains to objects that they can also read via the API is challenging, because we don't currently maintain a list of all the objects a user can read (nor is it clear we could). Instead, we maintain the roles for each user, from which we derive the actions a user can take on any given resource, assuming we have the resource by ID already.

There's also the interesting question of historical data here. Timeseries data are very much going to stick around, even after the object from which it derived is deleted. For example, instance vCPU data is still accessible even after the instance is gone. Dave thought this might be a bug in the API -- we could instead show the instance, including the fact that it's been deleted, whereas today I think we return a 404 for it. But in any case, we'd like to make these two views consistent in this dimension as well: whatever objects you can learn about through a timeseries query, it would also be nice to be able to learn about those through the API.

This is a hard problem, one which we both would prefer to punt on. In the interim, there is a potentially-useful API change that could take us pretty far. Right now, there's a concept of the "authorization scope" in a timeseries definition. E.g., "fleet" scope refers to hardware telemetry that should be accessible to an operator; "project" scope refers to things that any project reader should be able to see.

We could reflect this scope in the API, by providing a separate timeseries query endpoint for each of those different scopes. So, if you wanted to query the sled_data_link:bytes_sent timeseries, which has fleet scope, then one would hit the /v1/timeseries/query/fleet endpoint with the contained query. For the project-scoped data, one might do /v1/timeseries/query/project/{project_id} (or the ID might be in the request body or the query itself). In that case, we have a single project ID in the API request, so we can do authz as usual. If a user wants to do this for many project, then that would require many queries.

This would let us avoid the problem of analyzing or modifying the query itself to solve the entirely general problem, but still let us expose the querying endpoints in a useful way. We could gather more data on the queries that we actually see, and use that to further direct efforts around finer-grained authz.

@david-crespo
Copy link
Contributor

Great idea, I'm all for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the API. Metrics
Projects
None yet
Development

No branches or pull requests

2 participants