Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Upgrading to 7.10 might break Alerts created in 7.7 - 7.9 #70851

Closed
gmmorris opened this issue Jul 6, 2020 · 9 comments
Closed

[Discuss] Upgrading to 7.10 might break Alerts created in 7.7 - 7.9 #70851

gmmorris opened this issue Jul 6, 2020 · 9 comments
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@gmmorris
Copy link
Contributor

gmmorris commented Jul 6, 2020

Discussing the new Feature Controls & RBAC (#43994) for Alerting with @legrego we have realised that we're likely to break a bunch of Alerts that already exist for our users when they upgrade to 7.10.

Below I summarise what scenarios are going to break and I'd like us to discuss how we want to proceed with this work given this behavior.

Context

Since version 7.7 we've provided the ability (if somewhat limited in 7.7, though still possible) to create Alerts either through Alerts Management or through solutions that are building on Alerting.

We did so in a permissive manner which did not take into account any form of Feature Controls or Role Based Access Control (RBAC), but did take into account Spaces and Security, meaning that we do ensure that an alert created by a user can't do anything that user wouldn't already be privileged to do themselves, but we didn't provide any way for administrators to limit which users can actually do this.

That said, we only provided access to the Alerts Management to users who also had access to one of the following solutions: APM, Uptime, Metrics, Logs and/or SIEM.

The works introduced in Version 7.10

As part of the work to bring Alerting out of Beta and into General Availability we've prioritised to work to introduced robust Feature Controls and RBAC which would provide the following:

  1. Ensure users can only create, modify and consume Alerts that are owned by solutions that they have access to (for example, only a user with SIEM access should be able to create a SIEM alert).
  2. Ensure Read and All privileges are respected, so users who are , for example, read-only users can only read an alert, but can't change it.
  3. Introduce a Built-In Alerts feature which will allow administrators to decide which users can create, modify and read any Alert Type we provide out-of-the-box, which at the time of writing this is only the Index Threshold alert (other Alert Types are owned by specific solutions).
  4. Introduce an Actions feature which will allow administrators to decide which users can create and modify Connectors and which users can only use these Connectors to execute Actions when their Alerts are activated.

(There are a bunch of other nuanced aspects to this, but that gives you the broad strokes of why we've introduced this work).

The PR introducing the Alert RBAC & Built-In Alerts feature is here: #67157
The PR introducing the Actions RBAC & feature is here: #70304
There's a draft PR that include the entirety of the RBAC work (Alerting and Actions side by side) here: #70734

Upgrading from 7.7 / 7.8 / 7.9 to 7.10

As a consequence of this new RBAC work we are essentially introducing a less permissive permissions model than we had before hand, which means that we will be breaking certain alerts that might already exist in customer deployments.
I'd like to run through the specific scenarios and explain why they will break and explain how users will be able to correct the breakage. It's important to note that in all these cases the Alerts will break, but will not be lost and will be completely recoverable through user interaction.

First, the broad strokes as I understand them (@legrego will correct me if I got something wrong):

  1. By default ,when a new feature is introduced, roles with custom privileges (which we believe are quite common in large deployments, but sadly there is no telemetry to confirm this) are not granted access to this new feature. This means that, by default, these users will no longer have privileges to the Built-In Alerts and Actions.
  2. Roles with Base All or Base Read privileges inherit these abilities when new features are introduced, so these users would in theory retain the same access they had before the upgrade, but as we defaulted to All across all of the solutions that provided Alerts in 7.7 through to 7.9, in practice this means that any user with Base Read would now be granted Read when before hand they had All. This will not break their Alerts, but they will no longer be able to modify them.

What this means is that the following will happen when a deployment is upgraded to 7.10:

Scenario What Will Happen? Recovering the Alert
A user with Custom All access to the Metrics solution created an Index Threshold alert through Alerts Management in 7.7/8 As they are not granted any privileges to the Actions & Built-In Alerts features, their Api Key will not have the required privileges and the Alert will break when it tries to run The user will need to ask their administrator for access to these features and then they'll need to open the Alert and save it (no need to modify anything in the alert). This will generate a new Api Key with the required privileges and the Alert will work.
A user with Custom All access to the Metrics solution created a Metrics Inventory alert through the Metrics solution in 7.7/8/9 with an Email action on it The user's privileges to Metrics applies to the Metrics alerts, so they can run the Alert, but as they are not granted any privileges to the Actions feature, their Api Key will not have the required privileges to execute any actions that have been configured for the alert. The Alert itself will run, as it relies on their privileges to Metrics, but when the actions are scheduled, they will fail to execute. The user will need to ask their administrator for access to the new Actions feature. Once this feature is granted, they'll need to open the Alert and save it (no need to modify anything in the alert). This will generate a new Api Key with the required privileges and the Alert will work.
A user with Base Read permissions created an Alert in 7.7/8/9 The user will inherit Read privileges to the Actions & Built-In Alerts features, which means that their existing alerts will continue to work and their actions will continue to execute, but they will no longer be able to modify them There's nothing to fix here, as this is by design, but there might be unhappy users out there as they no longer have the privileges needed to do what they could do before

I think that table now cover the different situations which are going to break, but there might be more... I'll have to investigate further before I can say for sure.

There are scenarios where nothing will break, of course, such as:

  1. A user with Custom All access to the Metrics solution created an Metrics' Inventory through the Metrics solution in 7.7/8 without any actions
  2. A user with Base All access created any alert in 7.7/8/9 with or without actions
  3. A user with Base Read access created any alert in 7.7/8/9 without actions

What do we need to do next?

We need to decide whether we can live with the 7.10 release breaking these alerts.
Considering we're officially in Beta, this could be considered reasonable, but we'll definitely have some unhappy users out there, so we'd need to find the best way to communicate this change to them before they upgrade.

If we decide we can't live with this, we're going to have to work with the Security team to find some kind of work around.
If we could some how provide roles with Custom privileges some kind of special casing, such as automatically grant All or Read to the Actions & Built-In Alerts features when they have privileges to one of the solutions that provide alerts, then we could side step most of the issue, but I don't know if that's possible for 7.10.

Let me know what you think.
For now I am marking the Feature Controls issue as Blocked as I don't think we can merge it until this question is addressed.

@gmmorris gmmorris added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.9.0 labels Jul 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@legrego
Copy link
Member

legrego commented Jul 6, 2020

First, the broad strokes as I understand them (@legrego will correct me if I got something wrong):

Your analysis looks correct to me here. Nice job laying this out, it was easy to follow and understand 🏅

If we could some how provide roles with Custom privileges some kind of special casing, such as automatically grant All or Read to the Actions & Built-In Alerts features when they have privileges to one of the solutions that provide alerts, then we could side step most of the issue, but I don't know if that's possible for 7.9.

@gmmorris and I already discussed this briefly, but for posterity:

The only mechanism we have today is for these solutions' features to declare the necessary alerting privileges themselves to interact with the built-in alert types/actions. This is problematic, as it will exacerbate the problem by persisting this behavior for new roles, which is training our administrators the "wrong way" to do things. It also doesn't give administrators the ability to opt-out of this behavior: they wouldn't be able to create a role with access to SIEM, but without access to the built-in types.

We don't have a way to special-case existing roles without also impacting future roles. We'd have to treat them the same way. This is because Kibana doesn't have the necessary privileges (by design) to manipulate user roles, so we can't go and patch existing roles to suit our needs, and we don't have a way to detect when a role was created/updated from within Kibana.

@mikecote
Copy link
Contributor

mikecote commented Jul 6, 2020

I was talking with @kobelb and I think it's ok to accept this as a breaking change because we are still Beta. This seems like a change we have to do to get RBAC work done for GA. We can make sure it's properly documented with workarounds in the release notes and communicated to support.

I also want to make sure @peterschretlen @bmcconaghy and @arisonl are ok with this impact for users.


The user will need to ask their administrator for access to these features and then they'll need to open the Alert and save it (no need to modify anything in the alert). This will generate a new Api Key with the required privileges and the Alert will work.

Do they need a new API key or just updating the role will fix the alert?

@legrego
Copy link
Member

legrego commented Jul 6, 2020

Do they need a new API key or just updating the role will fix the alert?

They will need to have their role updated first, and then a new API key created for the alert

@gmmorris
Copy link
Contributor Author

gmmorris commented Jul 7, 2020

I just realised I might have got one scenario wrong:

Scenario What Will Happen? Recovering the Alert
A user with Custom All access to the Metrics solution created a Metrics Inventory alert through the Metrics solution in 7.7/8 with an Email action on it The user's privileges to Metrics applies to the Metrics alerts, so they can run the Alert, but as they are not granted any privileges to the Actions feature, their Api Key will not have the required privileges to execute any actions that have been configured for the alert. The Alert itself will run, as it relies on their privileges to Metrics, but when the actions are scheduled, they will fail to execute. The user will need to ask their administrator for access to the new Actions feature. Once this feature is granted, the Alerts and its Actions will go back to working order.

I think the user will have to generate a new Api Key for this one as well, right?
@legrego

@legrego
Copy link
Member

legrego commented Jul 7, 2020

I think the user will have to generate a new Api Key for this one as well, right?

Yes that's correct

@gmmorris
Copy link
Contributor Author

gmmorris commented Jul 7, 2020

Thanks @legrego , updated the table.

@gmmorris gmmorris changed the title [Discuss] Upgrading to 7.9 might break Alerts created in 7.7 & 7.8 [Discuss] Upgrading to 7.10 might break Alerts created in 7.7 - 7.9 Aug 11, 2020
@mikecote mikecote mentioned this issue Aug 13, 2020
36 tasks
@gmmorris
Copy link
Contributor Author

We have just merged an enhancement designed to mitigate some of the issues raised by this issue.

The Enhancement

The approach we chose was to migrate all alerts created prior to 7.10 into a legacy mode where RBAC is "dialed down" so that they can continue to run and execute actions after the upgrade without requiring any human intervention.
That said, there are a few of things to note about this approach:

  1. The exemption doesn't switch RBAC off entirely, but rather authorizes the user to perform some operations without the required privileges. As a result, editing these Alerts will still require privileges to the Actions Feature, meaning we'll still need to clearly communicate the change to users who'll likely need to address this on their end.
  2. This only applies to users who should have access to the Alert, meaning, an Alert created by a user who doesn't have privileges to the solution will stop working after the upgrade (for example, an Observability Alert created by a user who doesn't have privileges in Observability).
  3. As we're migrating all the alerts, we might see a performance impact in the upgrade itself when deployments have a large number of Alerts, but we found that migrating the entire SIEM suite (200+ alerts) took only a few more seconds. We chose this path to make this exemption more secure, but we can reassess if this turns out to be a problem.

This exemption has introduced some tech debt that we'll likely be paying off until 8.0.0, but my personal feeling is that removing this exemption at a later date will be a relatively straight forward change, so I think we've found a reasonable balance. 🤔

Impact in terms of the scenarios mentioned above

Scenario What Will Happen? Recovering the Alert
A user with Custom All access to the Metrics solution created an Index Threshold alert through Alerts Management in 7.7/8 As they are not granted any privileges to the Actions & Built-In Alerts features, their Api Key will not have the required privileges and the Alert will break when it tries to run The user will need to ask their administrator for access to these features and then they'll need to open the Alert and save it (no need to modify anything in the alert). This will generate a new Api Key with the required privileges and the Alert will work.
A user with Custom All access to the Metrics solution created a Metrics Inventory alert through the Metrics solution in 7.7/8/9 with an Email action on it The user's privileges to Metrics applies to the Metrics alerts, so they can run the Alert. During the upgrade the Alert is marked as Legacy, which means that despite not having any privileges to the Actions feature, the Alert itself will not only run but it will also continue to schedule actions which will execute as usual. The user will not be able to edit the alert though, as privileges to Actions are still required in order to do that. The user will need to ask their administrator for access to the new Actions feature. Once this feature is granted, they'll be able to modify the Alert which will remove the Legacy marker, but it will also generate a new Api Key with the required privileges which means it'll continue to work as expected.

@gmmorris
Copy link
Contributor Author

gmmorris commented Sep 16, 2020

Following the enhancement we've merged into Main (which will make it into 7.10), I am now closing this issue.
As far as the Alerting team is concerned the scenarios where old Alerts might stop working as part of the upgrade to 7.10 as expected and reasonable in the context of Alerting still being in Beta.

Please reopen this issue if there are any additional concerns. 🙏

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

5 participants