Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Detection Engine health API #125642

Open
4 of 11 tasks
banderror opened this issue Feb 15, 2022 · 5 comments
Open
4 of 11 tasks

[Security Solution] Detection Engine health API #125642

banderror opened this issue Feb 15, 2022 · 5 comments
Assignees
Labels
epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.8.2 v8.9.0 v8.11.0

Comments

@banderror
Copy link
Contributor

banderror commented Feb 15, 2022

Summary

Kibana Task Manager provides an api/task_manager/_health endpoint (doc 1, doc 2) which is very useful for troubleshooting performance and scaling issues with Security rules.

However, we could provide much more observability into the specifics of the Detection Engine and Security rule execution, which would help us troubleshoot issues with rule execution, cluster scaling, etc. The idea is to implement a Security-specific Detection Engine health API.

In the future, this API might become helpful for building more Rule Monitoring UIs giving our users more clarity and transparency about the work of the Detection Engine.

API requirements/ideas

It would be great to have an API that could provide a way to see how different "slices" or "scopes" of rules perform, for example:

  • Health overview of the whole cluster
    • Scope: all detection rules in all Kibana spaces, i.e. the whole cluster
  • Health overview of a space
    • Scope: all detection rules in a given Kibana space
  • Health overview of a rule type
    • Scope: detection rules of a given type in a given Kibana space
  • Health overview of a rule
    • Scope: a given rule in a given Kibana space

For each scope, we could calculate and return a lot of info representing the current ("now") health of detection rules. In addition to that, we could specify some time-based parameters to calculate how health was changing over time:

  • Date range: last hour, day, week, month, year
  • Granularity: minute, hour, day, week, month

Some ideas for what we could return from the API (each idea can apply to multiple scopes above):

  • current health stats at the moment of the API call:
    • number of Kibana instances
    • number of Kibana spaces
    • number of all rules
    • number of enabled and disabled rules
    • number of prebuilt and custom rules + how many of them are enabled and disabled
    • number of rules of each type + how many of them are enabled and disabled
    • number of rules with exceptions
    • number of rules with notification actions
    • number of rules with legacy notification actions
    • number of rules with response actions
    • number of rules by last execution status (succeeded, partial failure, failed, no status)
    • top X last failed statuses (messages) + rule ids for each status
    • top X last partial failure statuses (messages) + rule ids for each status
    • top X slowest rules by a few metrics (last total execution time, last search time, last indexing time)
    • top X rules with the largest scheduling delay (drift)
    • all rules that are querying indices with future timestamps + the actual index names with future timestamps in them (the API would need to check all rule's index patterns and data views)
    • top X rules by number of shards queried/shards queried in a particular data tier
  • health stats over a specified period of time:
    • aggregated rule execution metrics, e.g. number of executions, total execution time, query time, indexing time, scheduling delay (drift), detected gaps, etc
      • for most metrics: a set of percentiles that would be helpful in most cases, e.g. p01, p05, p25, p50, p75, p95, p99
    • change of rule execution statuses over time: a histogram for each status
    • change of rule execution metrics over time: a histogram for each percentile of each metric
    • top X rules by each execution metric
    • top X rules that are consuming the most total execution time - summing execution time over the executions for that rule, so it accounts for rules that are running more often
    • top X rules by final execution status (rules failing most often or ending up with a partial failure)
    • top X rules by logged messages of different log levels (errors and warnings are most interesting)
    • top X errors, top X warnings
    • top X error codes (we don't have error codes in our logs yet)

It feels like this API should be composed of multiple endpoints.

To do

  • Work on a PoC and implement an internal Detection Engine health API. (PR)
  • Write developer documentation for how to use it. (PR, PR)
  • Add support for it to the support-diagnostics tool so its output could be available in the diagnostic dumps. (PR, PR)
  • Implement the cluster health endpoint. (PR)
  • Implement an API that checks all rule's index patterns and returns the rules that are querying indices with future timestamps, in addition to the actual index names with future timestamps in them
  • Implement an API that identifies the rules that are consuming the most total execution time - summing execution time over the executions for that rule, so it accounts for rules that are running more often
  • Calculate more metrics for the rule health endpoint.
  • Calculate more metrics for the space health endpoint.
  • Address // TODO: https:/elastic/kibana/issues/125642 comments.
  • Write a test plan and cover the API with integration and unit tests.
  • At some point, consider making the API public and writing user-facing documentation for how to use it.

Some ideas worth discussing and planning, maybe as separate epics:

  • Telemetry: sending aggregated health metrics (e.g. calculated by the cluster health endpoint) to telemetry clusters. That's how we could measure production rule health and performance across user clusters. Devon Kerr: "Theoretically, it could also give you implicit insights— if we could cross-reference by cluster configuration, we could also notify users who under-resourced their clusters based on rule performance".
  • Adding support to the support-diagnostics tool for executing aggregation queries directly against Elasticsearch, not via the Detection Engine health API, for stack versions lower than X where the health API endpoints were added.
  • Writing dev docs on Elasticsearch queries useful for troubleshooting the health and performance of detection rules.
@banderror banderror added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Rule Monitoring Security Solution Detection Rule Monitoring Team:Detection Rule Management Security Detection Rule Management Team labels Feb 15, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror banderror changed the title [Security Solution][Detections] Implement Detection Engine health endpoint [Security Solution] Implement Detection Engine health endpoint Nov 24, 2022
@banderror banderror changed the title [Security Solution] Implement Detection Engine health endpoint [Security Solution] Detection Engine health API Apr 25, 2023
@banderror banderror added the epic label Apr 25, 2023
@banderror banderror self-assigned this Apr 30, 2023
banderror added a commit that referenced this issue Jun 14, 2023
**Partially addresses:** https:/elastic/kibana/issues/125642

## Summary

This PR introduces a PoC health API that allows users to get health
overview of the Detection Engine across the whole cluster, or within a
given Kibana space, or for a given rule. It can be useful for
troubleshooting issues with cluster provisioning/scaling, issues with
certain rules failing or generating too much load on the cluster,
identifying common rule execution errors, etc.

In the future, this API might become helpful for building more Rule
Monitoring UIs giving our users more clarity and transparency about the
work of the Detection Engine.

## Rule health endpoint

🚧 NOTE: this endpoint is **partially implemented**. 🚧

```txt
POST /internal/detection_engine/health/_rule
```

Get health overview of a rule. Scope: a given detection rule in the
current Kibana space.
Returns:

- health stats at the moment of the API call (rule and its execution
summary)
- health stats over a specified period of time ("health interval")
- health stats history within the same interval in the form of a
histogram
(the same stats are calculated over each of the discreet sub-intervals
of the whole interval)

Minimal required parameters:

```json
{
  "rule_id": "d4beff10-f045-11ed-89d8-3b6931af10bc"
}
```


<details><summary>Response:</summary>
<p>

```json
{
  "timings": {
    "requested_at": "2023-05-26T16:09:54.128Z",
    "processed_at": "2023-05-26T16:09:54.778Z",
    "processing_time_ms": 650
  },
  "parameters": {
    "interval": {
      "type": "last_day",
      "granularity": "hour",
      "from": "2023-05-25T16:09:54.128Z",
      "to": "2023-05-26T16:09:54.128Z",
      "duration": "PT24H"
    },
    "rule_id": "d4beff10-f045-11ed-89d8-3b6931af10bc"
  },
  "health": {
    "stats_at_the_moment": {
      "rule": {
        "id": "d4beff10-f045-11ed-89d8-3b6931af10bc",
        "updated_at": "2023-05-26T15:44:21.689Z",
        "updated_by": "elastic",
        "created_at": "2023-05-11T21:50:23.830Z",
        "created_by": "elastic",
        "name": "Test rule",
        "tags": ["foo"],
        "interval": "1m",
        "enabled": true,
        "revision": 2,
        "description": "-",
        "risk_score": 21,
        "severity": "low",
        "license": "",
        "output_index": "",
        "meta": {
          "from": "6h",
          "kibana_siem_app_url": "http://localhost:5601/kbn/app/security"
        },
        "author": [],
        "false_positives": [],
        "from": "now-21660s",
        "rule_id": "e46eaaf3-6d81-4cdb-8cbb-b2201a11358b",
        "max_signals": 100,
        "risk_score_mapping": [],
        "severity_mapping": [],
        "threat": [],
        "to": "now",
        "references": [],
        "version": 3,
        "exceptions_list": [],
        "immutable": false,
        "related_integrations": [],
        "required_fields": [],
        "setup": "",
        "type": "query",
        "language": "kuery",
        "index": [
          "apm-*-transaction*",
          "auditbeat-*",
          "endgame-*",
          "filebeat-*",
          "logs-*",
          "packetbeat-*",
          "traces-apm*",
          "winlogbeat-*",
          "-*elastic-cloud-logs-*",
          "foo-*"
        ],
        "query": "*",
        "filters": [],
        "actions": [
          {
            "group": "default",
            "id": "bd59c4e0-f045-11ed-89d8-3b6931af10bc",
            "params": {
              "body": "Hello world"
            },
            "action_type_id": ".webhook",
            "uuid": "f8b87eb0-58bb-4d4b-a584-084d44ab847e",
            "frequency": {
              "summary": true,
              "throttle": null,
              "notifyWhen": "onActiveAlert"
            }
          }
        ],
        "execution_summary": {
          "last_execution": {
            "date": "2023-05-26T16:09:36.848Z",
            "status": "succeeded",
            "status_order": 0,
            "message": "Rule execution completed successfully",
            "metrics": {
              "total_search_duration_ms": 2,
              "execution_gap_duration_s": 80395
            }
          }
        }
      }
    },
    "stats_over_interval": {
      "number_of_executions": {
        "total": 21,
        "by_outcome": {
          "succeeded": 20,
          "warning": 0,
          "failed": 1
        }
      },
      "number_of_logged_messages": {
        "total": 42,
        "by_level": {
          "error": 1,
          "warn": 0,
          "info": 41,
          "debug": 0,
          "trace": 0
        }
      },
      "number_of_detected_gaps": {
        "total": 1,
        "total_duration_s": 80395
      },
      "schedule_delay_ms": {
        "percentiles": {
          "1.0": 3061,
          "5.0": 3083,
          "25.0": 3112,
          "50.0": 6049,
          "75.0": 6069.5,
          "95.0": 100093.79999999986,
          "99.0": 207687
        }
      },
      "execution_duration_ms": {
        "percentiles": {
          "1.0": 226,
          "5.0": 228.2,
          "25.0": 355.5,
          "50.0": 422,
          "75.0": 447,
          "95.0": 677.75,
          "99.0": 719
        }
      },
      "search_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 1.1,
          "25.0": 2.75,
          "50.0": 7,
          "75.0": 13.5,
          "95.0": 29.59999999999998,
          "99.0": 45
        }
      },
      "indexing_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 0,
          "95.0": 0,
          "99.0": 0
        }
      },
      "top_errors": [
        {
          "count": 1,
          "message": "day were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances"
        }
      ],
      "top_warnings": []
    },
    "history_over_interval": {
      "buckets": [
        {
          "timestamp": "2023-05-26T15:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 12,
              "by_outcome": {
                "succeeded": 11,
                "warning": 0,
                "failed": 1
              }
            },
            "number_of_logged_messages": {
              "total": 24,
              "by_level": {
                "error": 1,
                "warn": 0,
                "info": 23,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 1,
              "total_duration_s": 80395
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 3106,
                "5.0": 3106.8,
                "25.0": 3124.5,
                "50.0": 6067.5,
                "75.0": 9060.5,
                "95.0": 188124.59999999971,
                "99.0": 207687
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 230,
                "5.0": 236.2,
                "25.0": 354,
                "50.0": 405,
                "75.0": 447.5,
                "95.0": 563.3999999999999,
                "99.0": 576
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0.20000000000000018,
                "25.0": 2.5,
                "50.0": 5,
                "75.0": 14,
                "95.0": 42.19999999999996,
                "99.0": 45
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        },
        {
          "timestamp": "2023-05-26T16:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 9,
              "by_outcome": {
                "succeeded": 9,
                "warning": 0,
                "failed": 0
              }
            },
            "number_of_logged_messages": {
              "total": 18,
              "by_level": {
                "error": 0,
                "warn": 0,
                "info": 18,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 0,
              "total_duration_s": 0
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 3061,
                "5.0": 3061,
                "25.0": 3104.75,
                "50.0": 3115,
                "75.0": 6053,
                "95.0": 6068,
                "99.0": 6068
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 226.00000000000003,
                "5.0": 226,
                "25.0": 356,
                "50.0": 436,
                "75.0": 495.5,
                "95.0": 719,
                "99.0": 719
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 2,
                "5.0": 2,
                "25.0": 5.75,
                "50.0": 8,
                "75.0": 13.75,
                "95.0": 17,
                "99.0": 17
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        }
      ]
    }
  }
}
```
</p>
</details> 

## Space health endpoint

🚧 NOTE: this endpoint is **partially implemented**. 🚧

```txt
POST /internal/detection_engine/health/_space
```

Get health overview of the current Kibana space. Scope: all detection
rules in the space.
Returns:

- health stats at the moment of the API call
- health stats over a specified period of time ("health interval")
- health stats history within the same interval in the form of a
histogram
(the same stats are calculated over each of the discreet sub-intervals
of the whole interval)

Minimal required parameters: empty object.

```json
{}
```



<details><summary>Response:</summary>
<p>

```json
{
  "timings": {
    "requested_at": "2023-05-26T16:24:21.628Z",
    "processed_at": "2023-05-26T16:24:22.880Z",
    "processing_time_ms": 1252
  },
  "parameters": {
    "interval": {
      "type": "last_day",
      "granularity": "hour",
      "from": "2023-05-25T16:24:21.628Z",
      "to": "2023-05-26T16:24:21.628Z",
      "duration": "PT24H"
    }
  },
  "health": {
    "stats_at_the_moment": {
      "number_of_rules": {
        "all": {
          "total": 777,
          "enabled": 777,
          "disabled": 0
        },
        "by_origin": {
          "prebuilt": {
            "total": 776,
            "enabled": 776,
            "disabled": 0
          },
          "custom": {
            "total": 1,
            "enabled": 1,
            "disabled": 0
          }
        },
        "by_type": {
          "siem.eqlRule": {
            "total": 381,
            "enabled": 381,
            "disabled": 0
          },
          "siem.queryRule": {
            "total": 325,
            "enabled": 325,
            "disabled": 0
          },
          "siem.mlRule": {
            "total": 47,
            "enabled": 47,
            "disabled": 0
          },
          "siem.thresholdRule": {
            "total": 18,
            "enabled": 18,
            "disabled": 0
          },
          "siem.newTermsRule": {
            "total": 4,
            "enabled": 4,
            "disabled": 0
          },
          "siem.indicatorRule": {
            "total": 2,
            "enabled": 2,
            "disabled": 0
          }
        },
        "by_outcome": {
          "warning": {
            "total": 307,
            "enabled": 307,
            "disabled": 0
          },
          "succeeded": {
            "total": 266,
            "enabled": 266,
            "disabled": 0
          },
          "failed": {
            "total": 204,
            "enabled": 204,
            "disabled": 0
          }
        }
      }
    },
    "stats_over_interval": {
      "number_of_executions": {
        "total": 5622,
        "by_outcome": {
          "succeeded": 1882,
          "warning": 2129,
          "failed": 2120
        }
      },
      "number_of_logged_messages": {
        "total": 11756,
        "by_level": {
          "error": 2120,
          "warn": 2129,
          "info": 7507,
          "debug": 0,
          "trace": 0
        }
      },
      "number_of_detected_gaps": {
        "total": 777,
        "total_duration_s": 514415894
      },
      "schedule_delay_ms": {
        "percentiles": {
          "1.0": 216,
          "5.0": 3048.5,
          "25.0": 3105,
          "50.0": 3129,
          "75.0": 6112.355119825708,
          "95.0": 134006,
          "99.0": 195578
        }
      },
      "execution_duration_ms": {
        "percentiles": {
          "1.0": 275,
          "5.0": 323.375,
          "25.0": 370.80555555555554,
          "50.0": 413.1122337092731,
          "75.0": 502.25233127864715,
          "95.0": 685.8055555555555,
          "99.0": 1194.75
        }
      },
      "search_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 15,
          "95.0": 30,
          "99.0": 99.44000000000005
        }
      },
      "indexing_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 0,
          "95.0": 0,
          "99.0": 0
        }
      },
      "top_errors": [
        {
          "count": 1202,
          "message": "An error occurred during rule execution message verification_exception"
        },
        {
          "count": 777,
          "message": "were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message rare_error_code missing"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message v3_windows_anomalous_path_activity missing"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message v3_windows_rare_user_type10_remote_login missing"
        }
      ],
      "top_warnings": [
        {
          "count": 2129,
          "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching was found This warning will continue to appear until matching index is created or this rule is disabled"
        }
      ]
    },
    "history_over_interval": {
      "buckets": [
        {
          "timestamp": "2023-05-26T15:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 2245,
              "by_outcome": {
                "succeeded": 566,
                "warning": 849,
                "failed": 1336
              }
            },
            "number_of_logged_messages": {
              "total": 4996,
              "by_level": {
                "error": 1336,
                "warn": 849,
                "info": 2811,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 777,
              "total_duration_s": 514415894
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 256,
                "5.0": 3086.9722222222217,
                "25.0": 3133,
                "50.0": 6126,
                "75.0": 59484.25,
                "95.0": 179817.25,
                "99.0": 202613
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 280.6,
                "5.0": 327.7,
                "25.0": 371.5208333333333,
                "50.0": 415.6190476190476,
                "75.0": 505.7642857142857,
                "95.0": 740.4375,
                "99.0": 1446.1500000000005
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 8,
                "95.0": 25,
                "99.0": 46
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        },
        {
          "timestamp": "2023-05-26T16:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 3363,
              "by_outcome": {
                "succeeded": 1316,
                "warning": 1280,
                "failed": 784
              }
            },
            "number_of_logged_messages": {
              "total": 6760,
              "by_level": {
                "error": 784,
                "warn": 1280,
                "info": 4696,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 0,
              "total_duration_s": 0
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 207,
                "5.0": 3042,
                "25.0": 3098.46511627907,
                "50.0": 3112,
                "75.0": 3145.2820512820517,
                "95.0": 6100.571428571428,
                "99.0": 6123
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 275,
                "5.0": 319.85714285714283,
                "25.0": 370.0357142857143,
                "50.0": 410.79999229108853,
                "75.0": 500.7692307692308,
                "95.0": 675,
                "99.0": 781.3999999999996
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 9,
                "75.0": 17.555555555555557,
                "95.0": 34,
                "99.0": 110.5
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        }
      ]
    }
  }
}
```
</p>
</details> 

## Cluster health endpoint

🚧 NOTE: this endpoint is **not implemented**. 🚧

```txt
POST /internal/detection_engine/health/_cluster
```

Minimal required parameters: empty object.

```json
{}
```

<details><summary>Response:</summary>
<p>

```json
{
  "message": "Not implemented",
  "timings": {
    "requested_at": "2023-05-26T16:32:01.878Z",
    "processed_at": "2023-05-26T16:32:01.881Z",
    "processing_time_ms": 3
  },
  "parameters": {
    "interval": {
      "type": "last_week",
      "granularity": "hour",
      "from": "2023-05-19T16:32:01.878Z",
      "to": "2023-05-26T16:32:01.878Z",
      "duration": "PT168H"
    }
  },
  "health": {
    "stats_at_the_moment": {
      "number_of_rules": {
        "all": {
          "total": 0,
          "enabled": 0,
          "disabled": 0
        },
        "by_origin": {
          "prebuilt": {
            "total": 0,
            "enabled": 0,
            "disabled": 0
          },
          "custom": {
            "total": 0,
            "enabled": 0,
            "disabled": 0
          }
        },
        "by_type": {},
        "by_outcome": {}
      }
    },
    "stats_over_interval": {
      "message": "Not implemented"
    },
    "history_over_interval": {
      "buckets": []
    }
  }
}
```
</p>
</details> 



## Optional parameters

All the three endpoints accept optional `interval` and `debug` request
parameters.

### Health interval

You can change the interval over which the health stats will be
calculated. If you don't specify it, by default health stats will be
calculated over the last day with the granularity of 1 hour.

```json
{
  "interval": {
    "type": "last_week",
    "granularity": "day"
  }
}
```

You can also specify a custom date range with exact interval bounds.

```json
{
  "interval": {
    "type": "custom_range",
    "granularity": "minute",
    "from": "2023-05-20T16:24:21.628Z",
    "to": "2023-05-26T16:24:21.628Z"
  }
}
```

Please keep in mind that requesting large intervals with small
granularity can generate substantial load on the system and enormous API
responses.

### Debug mode

You can also include various debug information in the response, such as
queries and aggregations sent to Elasticsearch and response received
from it.

```json
{
  "debug": true
}
```

In the response you will find something like that:

<details><summary>Response:</summary>
<p>

```json
{
  "health": {
    "debug": {
      "rulesClient": {
        "request": {
          "aggs": {
            "rulesByEnabled": {
              "terms": {
                "field": "alert.attributes.enabled"
              }
            },
            "rulesByOrigin": {
              "terms": {
                "field": "alert.attributes.params.immutable"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            },
            "rulesByType": {
              "terms": {
                "field": "alert.attributes.alertTypeId"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            },
            "rulesByOutcome": {
              "terms": {
                "field": "alert.attributes.lastRun.outcome"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            }
          }
        },
        "response": {
          "aggregations": {
            "rulesByOutcome": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "warning",
                  "doc_count": 307,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 307
                      }
                    ]
                  }
                },
                {
                  "key": "succeeded",
                  "doc_count": 266,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 266
                      }
                    ]
                  }
                },
                {
                  "key": "failed",
                  "doc_count": 204,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 204
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByType": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "siem.eqlRule",
                  "doc_count": 381,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 381
                      }
                    ]
                  }
                },
                {
                  "key": "siem.queryRule",
                  "doc_count": 325,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 325
                      }
                    ]
                  }
                },
                {
                  "key": "siem.mlRule",
                  "doc_count": 47,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 47
                      }
                    ]
                  }
                },
                {
                  "key": "siem.thresholdRule",
                  "doc_count": 18,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 18
                      }
                    ]
                  }
                },
                {
                  "key": "siem.newTermsRule",
                  "doc_count": 4,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 4
                      }
                    ]
                  }
                },
                {
                  "key": "siem.indicatorRule",
                  "doc_count": 2,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 2
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByOrigin": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "true",
                  "doc_count": 776,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 776
                      }
                    ]
                  }
                },
                {
                  "key": "false",
                  "doc_count": 1,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 1
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByEnabled": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": 1,
                  "key_as_string": "true",
                  "doc_count": 777
                }
              ]
            }
          }
        }
      },
      "eventLog": {
        "request": {
          "aggs": {
            "totalExecutions": {
              "cardinality": {
                "field": "kibana.alert.rule.execution.uuid"
              }
            },
            "executeEvents": {
              "filter": {
                "term": {
                  "event.action": "execute"
                }
              },
              "aggs": {
                "executionDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_run_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                },
                "scheduleDelayNs": {
                  "percentiles": {
                    "field": "kibana.task.schedule_delay",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                }
              }
            },
            "statusChangeEvents": {
              "filter": {
                "bool": {
                  "filter": [
                    {
                      "term": {
                        "event.action": "status-change"
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "terms": {
                        "kibana.alert.rule.execution.status": ["running", "going to run"]
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "executionsByStatus": {
                  "terms": {
                    "field": "kibana.alert.rule.execution.status"
                  }
                }
              }
            },
            "executionMetricsEvents": {
              "filter": {
                "term": {
                  "event.action": "execution-metrics"
                }
              },
              "aggs": {
                "gaps": {
                  "filter": {
                    "exists": {
                      "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                    }
                  },
                  "aggs": {
                    "totalGapDurationS": {
                      "sum": {
                        "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                      }
                    }
                  }
                },
                "searchDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_search_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                },
                "indexingDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_indexing_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                }
              }
            },
            "messageContainingEvents": {
              "filter": {
                "terms": {
                  "event.action": ["status-change", "message"]
                }
              },
              "aggs": {
                "messagesByLogLevel": {
                  "terms": {
                    "field": "log.level"
                  }
                },
                "errors": {
                  "filter": {
                    "term": {
                      "log.level": "error"
                    }
                  },
                  "aggs": {
                    "topErrors": {
                      "categorize_text": {
                        "field": "message",
                        "size": 5,
                        "similarity_threshold": 99
                      }
                    }
                  }
                },
                "warnings": {
                  "filter": {
                    "term": {
                      "log.level": "warn"
                    }
                  },
                  "aggs": {
                    "topWarnings": {
                      "categorize_text": {
                        "field": "message",
                        "size": 5,
                        "similarity_threshold": 99
                      }
                    }
                  }
                }
              }
            },
            "statsHistory": {
              "date_histogram": {
                "field": "@timestamp",
                "calendar_interval": "hour"
              },
              "aggs": {
                "totalExecutions": {
                  "cardinality": {
                    "field": "kibana.alert.rule.execution.uuid"
                  }
                },
                "executeEvents": {
                  "filter": {
                    "term": {
                      "event.action": "execute"
                    }
                  },
                  "aggs": {
                    "executionDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_run_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    },
                    "scheduleDelayNs": {
                      "percentiles": {
                        "field": "kibana.task.schedule_delay",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    }
                  }
                },
                "statusChangeEvents": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "event.action": "status-change"
                          }
                        }
                      ],
                      "must_not": [
                        {
                          "terms": {
                            "kibana.alert.rule.execution.status": ["running", "going to run"]
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "executionsByStatus": {
                      "terms": {
                        "field": "kibana.alert.rule.execution.status"
                      }
                    }
                  }
                },
                "executionMetricsEvents": {
                  "filter": {
                    "term": {
                      "event.action": "execution-metrics"
                    }
                  },
                  "aggs": {
                    "gaps": {
                      "filter": {
                        "exists": {
                          "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                        }
                      },
                      "aggs": {
                        "totalGapDurationS": {
                          "sum": {
                            "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                          }
                        }
                      }
                    },
                    "searchDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_search_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    },
                    "indexingDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_indexing_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    }
                  }
                },
                "messageContainingEvents": {
                  "filter": {
                    "terms": {
                      "event.action": ["status-change", "message"]
                    }
                  },
                  "aggs": {
                    "messagesByLogLevel": {
                      "terms": {
                        "field": "log.level"
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "response": {
          "aggregations": {
            "statsHistory": {
              "buckets": [
                {
                  "key_as_string": "2023-05-26T15:00:00.000Z",
                  "key": 1685113200000,
                  "doc_count": 11388,
                  "statusChangeEvents": {
                    "doc_count": 2751,
                    "executionsByStatus": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "failed",
                          "doc_count": 1336
                        },
                        {
                          "key": "partial failure",
                          "doc_count": 849
                        },
                        {
                          "key": "succeeded",
                          "doc_count": 566
                        }
                      ]
                    }
                  },
                  "totalExecutions": {
                    "value": 2245
                  },
                  "messageContainingEvents": {
                    "doc_count": 4996,
                    "messagesByLogLevel": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "info",
                          "doc_count": 2811
                        },
                        {
                          "key": "error",
                          "doc_count": 1336
                        },
                        {
                          "key": "warn",
                          "doc_count": 849
                        }
                      ]
                    }
                  },
                  "executeEvents": {
                    "doc_count": 2245,
                    "scheduleDelayNs": {
                      "values": {
                        "1.0": 256000000,
                        "5.0": 3086972222.222222,
                        "25.0": 3133000000,
                        "50.0": 6126000000,
                        "75.0": 59484250000,
                        "95.0": 179817250000,
                        "99.0": 202613000000
                      }
                    },
                    "executionDurationMs": {
                      "values": {
                        "1.0": 280.6,
                        "5.0": 327.7,
                        "25.0": 371.5208333333333,
                        "50.0": 415.6190476190476,
                        "75.0": 505.575,
                        "95.0": 740.4375,
                        "99.0": 1446.1500000000005
                      }
                    }
                  },
                  "executionMetricsEvents": {
                    "doc_count": 1902,
                    "searchDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 8,
                        "95.0": 25,
                        "99.0": 46
                      }
                    },
                    "gaps": {
                      "doc_count": 777,
                      "totalGapDurationS": {
                        "value": 514415894
                      }
                    },
                    "indexingDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 0,
                        "95.0": 0,
                        "99.0": 0
                      }
                    }
                  }
                },
                {
                  "key_as_string": "2023-05-26T16:00:00.000Z",
                  "key": 1685116800000,
                  "doc_count": 28325,
                  "statusChangeEvents": {
                    "doc_count": 6126,
                    "executionsByStatus": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "succeeded",
                          "doc_count": 2390
                        },
                        {
                          "key": "partial failure",
                          "doc_count": 2305
                        },
                        {
                          "key": "failed",
                          "doc_count": 1431
                        }
                      ]
                    }
                  },
                  "totalExecutions": {
                    "value": 6170
                  },
                  "messageContainingEvents": {
                    "doc_count": 12252,
                    "messagesByLogLevel": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "info",
                          "doc_count": 8516
                        },
                        {
                          "key": "warn",
                          "doc_count": 2305
                        },
                        {
                          "key": "error",
                          "doc_count": 1431
                        }
                      ]
                    }
                  },
                  "executeEvents": {
                    "doc_count": 6126,
                    "scheduleDelayNs": {
                      "values": {
                        "1.0": 193000000,
                        "5.0": 3017785185.1851854,
                        "25.0": 3086000000,
                        "50.0": 3105877192.982456,
                        "75.0": 3134645161.290323,
                        "95.0": 6081772222.222222,
                        "99.0": 6122000000
                      }
                    },
                    "executionDurationMs": {
                      "values": {
                        "1.0": 275.17333333333335,
                        "5.0": 324.8014285714285,
                        "25.0": 377.0752688172043,
                        "50.0": 431,
                        "75.0": 532.3870967741935,
                        "95.0": 720.6761904761904,
                        "99.0": 922.6799999999985
                      }
                    }
                  },
                  "executionMetricsEvents": {
                    "doc_count": 3821,
                    "searchDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 9.8,
                        "75.0": 18,
                        "95.0": 40.17499999999999,
                        "99.0": 124
                      }
                    },
                    "gaps": {
                      "doc_count": 0,
                      "totalGapDurationS": {
                        "value": 0
                      }
                    },
                    "indexingDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 0,
                        "95.0": 0,
                        "99.0": 0
                      }
                    }
                  }
                }
              ]
            },
            "statusChangeEvents": {
              "doc_count": 8877,
              "executionsByStatus": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "partial failure",
                    "doc_count": 3154
                  },
                  {
                    "key": "succeeded",
                    "doc_count": 2956
                  },
                  {
                    "key": "failed",
                    "doc_count": 2767
                  }
                ]
              }
            },
            "totalExecutions": {
              "value": 8455
            },
            "messageContainingEvents": {
              "doc_count": 17248,
              "messagesByLogLevel": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "info",
                    "doc_count": 11327
                  },
                  {
                    "key": "warn",
                    "doc_count": 3154
                  },
                  {
                    "key": "error",
                    "doc_count": 2767
                  }
                ]
              },
              "warnings": {
                "doc_count": 3154,
                "topWarnings": {
                  "buckets": [
                    {
                      "doc_count": 3154,
                      "key": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching was found This warning will continue to appear until matching index is created or this rule is disabled",
                      "regex": ".*?This.+?rule.+?is.+?attempting.+?to.+?query.+?data.+?from.+?Elasticsearch.+?indices.+?listed.+?in.+?the.+?Index.+?pattern.+?section.+?of.+?the.+?rule.+?definition.+?however.+?no.+?index.+?matching.+?was.+?found.+?This.+?warning.+?will.+?continue.+?to.+?appear.+?until.+?matching.+?index.+?is.+?created.+?or.+?this.+?rule.+?is.+?disabled.*?",
                      "max_matching_length": 342
                    }
                  ]
                }
              },
              "errors": {
                "doc_count": 2767,
                "topErrors": {
                  "buckets": [
                    {
                      "doc_count": 1802,
                      "key": "An error occurred during rule execution message verification_exception",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?verification_exception.*?",
                      "max_matching_length": 2064
                    },
                    {
                      "doc_count": 777,
                      "key": "were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances",
                      "regex": ".*?were.+?not.+?queried.+?between.+?this.+?rule.+?execution.+?and.+?the.+?last.+?execution.+?so.+?signals.+?may.+?have.+?been.+?missed.+?Consider.+?increasing.+?your.+?look.+?behind.+?time.+?or.+?adding.+?more.+?Kibana.+?instances.*?",
                      "max_matching_length": 216
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message rare_error_code missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?rare_error_code.+?missing.*?",
                      "max_matching_length": 82
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message v3_windows_anomalous_path_activity missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?v3_windows_anomalous_path_activity.+?missing.*?",
                      "max_matching_length": 103
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message v3_windows_rare_user_type10_remote_login missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?v3_windows_rare_user_type10_remote_login.+?missing.*?",
                      "max_matching_length": 110
                    }
                  ]
                }
              }
            },
            "executeEvents": {
              "doc_count": 8371,
              "scheduleDelayNs": {
                "values": {
                  "1.0": 206000000,
                  "5.0": 3027000000,
                  "25.0": 3092000000,
                  "50.0": 3116000000,
                  "75.0": 3278666666.6666665,
                  "95.0": 99656950000,
                  "99.0": 186632790000
                }
              },
              "executionDurationMs": {
                "values": {
                  "1.0": 275.5325,
                  "5.0": 326.07857142857137,
                  "25.0": 375.68969144460027,
                  "50.0": 427,
                  "75.0": 526.2948717948718,
                  "95.0": 727.2480952380952,
                  "99.0": 1009.5299999999934
                }
              }
            },
            "executionMetricsEvents": {
              "doc_count": 5723,
              "searchDurationMs": {
                "values": {
                  "1.0": 0,
                  "5.0": 0,
                  "25.0": 0,
                  "50.0": 4,
                  "75.0": 16,
                  "95.0": 34.43846153846145,
                  "99.0": 116.51333333333302
                }
              },
              "gaps": {
                "doc_count": 777,
                "totalGapDurationS": {
                  "value": 514415894
                }
              },
              "indexingDurationMs": {
                "values": {
                  "1.0": 0,
                  "5.0": 0,
                  "25.0": 0,
                  "50.0": 0,
                  "75.0": 0,
                  "95.0": 0,
                  "99.0": 0
                }
              }
            }
          }
        }
      }
    }
  }
}
```
</p>
</details> 



## Other notes

I'm thinking about backporting it to `8.8` so it could become available
in `8.8.1+` clusters.

In the next episodes:

- Add support for it to the
[support-diagnostics](https:/elastic/support-diagnostics)
tool so its output could be available in the diagnostic dumps.
- Implement the cluster health endpoint.
- Calculate more metrics for the rule health endpoint.
- Calculate more metrics for the space health endpoint.
- Etc - see the to-do list in the epic.


### Checklist

Delete any items that are not applicable to this PR.

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- See dev docs added in
`x-pack/plugins/security_solution/common/detection_engine/rule_monitoring/api/detection_engine_health/README.md`
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
banderror added a commit to banderror/kibana that referenced this issue Jun 14, 2023
…57155)

**Partially addresses:** https:/elastic/kibana/issues/125642

## Summary

This PR introduces a PoC health API that allows users to get health
overview of the Detection Engine across the whole cluster, or within a
given Kibana space, or for a given rule. It can be useful for
troubleshooting issues with cluster provisioning/scaling, issues with
certain rules failing or generating too much load on the cluster,
identifying common rule execution errors, etc.

In the future, this API might become helpful for building more Rule
Monitoring UIs giving our users more clarity and transparency about the
work of the Detection Engine.

## Rule health endpoint

🚧 NOTE: this endpoint is **partially implemented**. 🚧

```txt
POST /internal/detection_engine/health/_rule
```

Get health overview of a rule. Scope: a given detection rule in the
current Kibana space.
Returns:

- health stats at the moment of the API call (rule and its execution
summary)
- health stats over a specified period of time ("health interval")
- health stats history within the same interval in the form of a
histogram
(the same stats are calculated over each of the discreet sub-intervals
of the whole interval)

Minimal required parameters:

```json
{
  "rule_id": "d4beff10-f045-11ed-89d8-3b6931af10bc"
}
```

<details><summary>Response:</summary>
<p>

```json
{
  "timings": {
    "requested_at": "2023-05-26T16:09:54.128Z",
    "processed_at": "2023-05-26T16:09:54.778Z",
    "processing_time_ms": 650
  },
  "parameters": {
    "interval": {
      "type": "last_day",
      "granularity": "hour",
      "from": "2023-05-25T16:09:54.128Z",
      "to": "2023-05-26T16:09:54.128Z",
      "duration": "PT24H"
    },
    "rule_id": "d4beff10-f045-11ed-89d8-3b6931af10bc"
  },
  "health": {
    "stats_at_the_moment": {
      "rule": {
        "id": "d4beff10-f045-11ed-89d8-3b6931af10bc",
        "updated_at": "2023-05-26T15:44:21.689Z",
        "updated_by": "elastic",
        "created_at": "2023-05-11T21:50:23.830Z",
        "created_by": "elastic",
        "name": "Test rule",
        "tags": ["foo"],
        "interval": "1m",
        "enabled": true,
        "revision": 2,
        "description": "-",
        "risk_score": 21,
        "severity": "low",
        "license": "",
        "output_index": "",
        "meta": {
          "from": "6h",
          "kibana_siem_app_url": "http://localhost:5601/kbn/app/security"
        },
        "author": [],
        "false_positives": [],
        "from": "now-21660s",
        "rule_id": "e46eaaf3-6d81-4cdb-8cbb-b2201a11358b",
        "max_signals": 100,
        "risk_score_mapping": [],
        "severity_mapping": [],
        "threat": [],
        "to": "now",
        "references": [],
        "version": 3,
        "exceptions_list": [],
        "immutable": false,
        "related_integrations": [],
        "required_fields": [],
        "setup": "",
        "type": "query",
        "language": "kuery",
        "index": [
          "apm-*-transaction*",
          "auditbeat-*",
          "endgame-*",
          "filebeat-*",
          "logs-*",
          "packetbeat-*",
          "traces-apm*",
          "winlogbeat-*",
          "-*elastic-cloud-logs-*",
          "foo-*"
        ],
        "query": "*",
        "filters": [],
        "actions": [
          {
            "group": "default",
            "id": "bd59c4e0-f045-11ed-89d8-3b6931af10bc",
            "params": {
              "body": "Hello world"
            },
            "action_type_id": ".webhook",
            "uuid": "f8b87eb0-58bb-4d4b-a584-084d44ab847e",
            "frequency": {
              "summary": true,
              "throttle": null,
              "notifyWhen": "onActiveAlert"
            }
          }
        ],
        "execution_summary": {
          "last_execution": {
            "date": "2023-05-26T16:09:36.848Z",
            "status": "succeeded",
            "status_order": 0,
            "message": "Rule execution completed successfully",
            "metrics": {
              "total_search_duration_ms": 2,
              "execution_gap_duration_s": 80395
            }
          }
        }
      }
    },
    "stats_over_interval": {
      "number_of_executions": {
        "total": 21,
        "by_outcome": {
          "succeeded": 20,
          "warning": 0,
          "failed": 1
        }
      },
      "number_of_logged_messages": {
        "total": 42,
        "by_level": {
          "error": 1,
          "warn": 0,
          "info": 41,
          "debug": 0,
          "trace": 0
        }
      },
      "number_of_detected_gaps": {
        "total": 1,
        "total_duration_s": 80395
      },
      "schedule_delay_ms": {
        "percentiles": {
          "1.0": 3061,
          "5.0": 3083,
          "25.0": 3112,
          "50.0": 6049,
          "75.0": 6069.5,
          "95.0": 100093.79999999986,
          "99.0": 207687
        }
      },
      "execution_duration_ms": {
        "percentiles": {
          "1.0": 226,
          "5.0": 228.2,
          "25.0": 355.5,
          "50.0": 422,
          "75.0": 447,
          "95.0": 677.75,
          "99.0": 719
        }
      },
      "search_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 1.1,
          "25.0": 2.75,
          "50.0": 7,
          "75.0": 13.5,
          "95.0": 29.59999999999998,
          "99.0": 45
        }
      },
      "indexing_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 0,
          "95.0": 0,
          "99.0": 0
        }
      },
      "top_errors": [
        {
          "count": 1,
          "message": "day were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances"
        }
      ],
      "top_warnings": []
    },
    "history_over_interval": {
      "buckets": [
        {
          "timestamp": "2023-05-26T15:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 12,
              "by_outcome": {
                "succeeded": 11,
                "warning": 0,
                "failed": 1
              }
            },
            "number_of_logged_messages": {
              "total": 24,
              "by_level": {
                "error": 1,
                "warn": 0,
                "info": 23,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 1,
              "total_duration_s": 80395
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 3106,
                "5.0": 3106.8,
                "25.0": 3124.5,
                "50.0": 6067.5,
                "75.0": 9060.5,
                "95.0": 188124.59999999971,
                "99.0": 207687
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 230,
                "5.0": 236.2,
                "25.0": 354,
                "50.0": 405,
                "75.0": 447.5,
                "95.0": 563.3999999999999,
                "99.0": 576
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0.20000000000000018,
                "25.0": 2.5,
                "50.0": 5,
                "75.0": 14,
                "95.0": 42.19999999999996,
                "99.0": 45
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        },
        {
          "timestamp": "2023-05-26T16:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 9,
              "by_outcome": {
                "succeeded": 9,
                "warning": 0,
                "failed": 0
              }
            },
            "number_of_logged_messages": {
              "total": 18,
              "by_level": {
                "error": 0,
                "warn": 0,
                "info": 18,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 0,
              "total_duration_s": 0
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 3061,
                "5.0": 3061,
                "25.0": 3104.75,
                "50.0": 3115,
                "75.0": 6053,
                "95.0": 6068,
                "99.0": 6068
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 226.00000000000003,
                "5.0": 226,
                "25.0": 356,
                "50.0": 436,
                "75.0": 495.5,
                "95.0": 719,
                "99.0": 719
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 2,
                "5.0": 2,
                "25.0": 5.75,
                "50.0": 8,
                "75.0": 13.75,
                "95.0": 17,
                "99.0": 17
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        }
      ]
    }
  }
}
```
</p>
</details>

## Space health endpoint

🚧 NOTE: this endpoint is **partially implemented**. 🚧

```txt
POST /internal/detection_engine/health/_space
```

Get health overview of the current Kibana space. Scope: all detection
rules in the space.
Returns:

- health stats at the moment of the API call
- health stats over a specified period of time ("health interval")
- health stats history within the same interval in the form of a
histogram
(the same stats are calculated over each of the discreet sub-intervals
of the whole interval)

Minimal required parameters: empty object.

```json
{}
```

<details><summary>Response:</summary>
<p>

```json
{
  "timings": {
    "requested_at": "2023-05-26T16:24:21.628Z",
    "processed_at": "2023-05-26T16:24:22.880Z",
    "processing_time_ms": 1252
  },
  "parameters": {
    "interval": {
      "type": "last_day",
      "granularity": "hour",
      "from": "2023-05-25T16:24:21.628Z",
      "to": "2023-05-26T16:24:21.628Z",
      "duration": "PT24H"
    }
  },
  "health": {
    "stats_at_the_moment": {
      "number_of_rules": {
        "all": {
          "total": 777,
          "enabled": 777,
          "disabled": 0
        },
        "by_origin": {
          "prebuilt": {
            "total": 776,
            "enabled": 776,
            "disabled": 0
          },
          "custom": {
            "total": 1,
            "enabled": 1,
            "disabled": 0
          }
        },
        "by_type": {
          "siem.eqlRule": {
            "total": 381,
            "enabled": 381,
            "disabled": 0
          },
          "siem.queryRule": {
            "total": 325,
            "enabled": 325,
            "disabled": 0
          },
          "siem.mlRule": {
            "total": 47,
            "enabled": 47,
            "disabled": 0
          },
          "siem.thresholdRule": {
            "total": 18,
            "enabled": 18,
            "disabled": 0
          },
          "siem.newTermsRule": {
            "total": 4,
            "enabled": 4,
            "disabled": 0
          },
          "siem.indicatorRule": {
            "total": 2,
            "enabled": 2,
            "disabled": 0
          }
        },
        "by_outcome": {
          "warning": {
            "total": 307,
            "enabled": 307,
            "disabled": 0
          },
          "succeeded": {
            "total": 266,
            "enabled": 266,
            "disabled": 0
          },
          "failed": {
            "total": 204,
            "enabled": 204,
            "disabled": 0
          }
        }
      }
    },
    "stats_over_interval": {
      "number_of_executions": {
        "total": 5622,
        "by_outcome": {
          "succeeded": 1882,
          "warning": 2129,
          "failed": 2120
        }
      },
      "number_of_logged_messages": {
        "total": 11756,
        "by_level": {
          "error": 2120,
          "warn": 2129,
          "info": 7507,
          "debug": 0,
          "trace": 0
        }
      },
      "number_of_detected_gaps": {
        "total": 777,
        "total_duration_s": 514415894
      },
      "schedule_delay_ms": {
        "percentiles": {
          "1.0": 216,
          "5.0": 3048.5,
          "25.0": 3105,
          "50.0": 3129,
          "75.0": 6112.355119825708,
          "95.0": 134006,
          "99.0": 195578
        }
      },
      "execution_duration_ms": {
        "percentiles": {
          "1.0": 275,
          "5.0": 323.375,
          "25.0": 370.80555555555554,
          "50.0": 413.1122337092731,
          "75.0": 502.25233127864715,
          "95.0": 685.8055555555555,
          "99.0": 1194.75
        }
      },
      "search_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 15,
          "95.0": 30,
          "99.0": 99.44000000000005
        }
      },
      "indexing_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 0,
          "95.0": 0,
          "99.0": 0
        }
      },
      "top_errors": [
        {
          "count": 1202,
          "message": "An error occurred during rule execution message verification_exception"
        },
        {
          "count": 777,
          "message": "were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message rare_error_code missing"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message v3_windows_anomalous_path_activity missing"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message v3_windows_rare_user_type10_remote_login missing"
        }
      ],
      "top_warnings": [
        {
          "count": 2129,
          "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching was found This warning will continue to appear until matching index is created or this rule is disabled"
        }
      ]
    },
    "history_over_interval": {
      "buckets": [
        {
          "timestamp": "2023-05-26T15:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 2245,
              "by_outcome": {
                "succeeded": 566,
                "warning": 849,
                "failed": 1336
              }
            },
            "number_of_logged_messages": {
              "total": 4996,
              "by_level": {
                "error": 1336,
                "warn": 849,
                "info": 2811,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 777,
              "total_duration_s": 514415894
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 256,
                "5.0": 3086.9722222222217,
                "25.0": 3133,
                "50.0": 6126,
                "75.0": 59484.25,
                "95.0": 179817.25,
                "99.0": 202613
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 280.6,
                "5.0": 327.7,
                "25.0": 371.5208333333333,
                "50.0": 415.6190476190476,
                "75.0": 505.7642857142857,
                "95.0": 740.4375,
                "99.0": 1446.1500000000005
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 8,
                "95.0": 25,
                "99.0": 46
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        },
        {
          "timestamp": "2023-05-26T16:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 3363,
              "by_outcome": {
                "succeeded": 1316,
                "warning": 1280,
                "failed": 784
              }
            },
            "number_of_logged_messages": {
              "total": 6760,
              "by_level": {
                "error": 784,
                "warn": 1280,
                "info": 4696,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 0,
              "total_duration_s": 0
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 207,
                "5.0": 3042,
                "25.0": 3098.46511627907,
                "50.0": 3112,
                "75.0": 3145.2820512820517,
                "95.0": 6100.571428571428,
                "99.0": 6123
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 275,
                "5.0": 319.85714285714283,
                "25.0": 370.0357142857143,
                "50.0": 410.79999229108853,
                "75.0": 500.7692307692308,
                "95.0": 675,
                "99.0": 781.3999999999996
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 9,
                "75.0": 17.555555555555557,
                "95.0": 34,
                "99.0": 110.5
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        }
      ]
    }
  }
}
```
</p>
</details>

## Cluster health endpoint

🚧 NOTE: this endpoint is **not implemented**. 🚧

```txt
POST /internal/detection_engine/health/_cluster
```

Minimal required parameters: empty object.

```json
{}
```

<details><summary>Response:</summary>
<p>

```json
{
  "message": "Not implemented",
  "timings": {
    "requested_at": "2023-05-26T16:32:01.878Z",
    "processed_at": "2023-05-26T16:32:01.881Z",
    "processing_time_ms": 3
  },
  "parameters": {
    "interval": {
      "type": "last_week",
      "granularity": "hour",
      "from": "2023-05-19T16:32:01.878Z",
      "to": "2023-05-26T16:32:01.878Z",
      "duration": "PT168H"
    }
  },
  "health": {
    "stats_at_the_moment": {
      "number_of_rules": {
        "all": {
          "total": 0,
          "enabled": 0,
          "disabled": 0
        },
        "by_origin": {
          "prebuilt": {
            "total": 0,
            "enabled": 0,
            "disabled": 0
          },
          "custom": {
            "total": 0,
            "enabled": 0,
            "disabled": 0
          }
        },
        "by_type": {},
        "by_outcome": {}
      }
    },
    "stats_over_interval": {
      "message": "Not implemented"
    },
    "history_over_interval": {
      "buckets": []
    }
  }
}
```
</p>
</details>

## Optional parameters

All the three endpoints accept optional `interval` and `debug` request
parameters.

### Health interval

You can change the interval over which the health stats will be
calculated. If you don't specify it, by default health stats will be
calculated over the last day with the granularity of 1 hour.

```json
{
  "interval": {
    "type": "last_week",
    "granularity": "day"
  }
}
```

You can also specify a custom date range with exact interval bounds.

```json
{
  "interval": {
    "type": "custom_range",
    "granularity": "minute",
    "from": "2023-05-20T16:24:21.628Z",
    "to": "2023-05-26T16:24:21.628Z"
  }
}
```

Please keep in mind that requesting large intervals with small
granularity can generate substantial load on the system and enormous API
responses.

### Debug mode

You can also include various debug information in the response, such as
queries and aggregations sent to Elasticsearch and response received
from it.

```json
{
  "debug": true
}
```

In the response you will find something like that:

<details><summary>Response:</summary>
<p>

```json
{
  "health": {
    "debug": {
      "rulesClient": {
        "request": {
          "aggs": {
            "rulesByEnabled": {
              "terms": {
                "field": "alert.attributes.enabled"
              }
            },
            "rulesByOrigin": {
              "terms": {
                "field": "alert.attributes.params.immutable"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            },
            "rulesByType": {
              "terms": {
                "field": "alert.attributes.alertTypeId"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            },
            "rulesByOutcome": {
              "terms": {
                "field": "alert.attributes.lastRun.outcome"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            }
          }
        },
        "response": {
          "aggregations": {
            "rulesByOutcome": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "warning",
                  "doc_count": 307,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 307
                      }
                    ]
                  }
                },
                {
                  "key": "succeeded",
                  "doc_count": 266,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 266
                      }
                    ]
                  }
                },
                {
                  "key": "failed",
                  "doc_count": 204,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 204
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByType": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "siem.eqlRule",
                  "doc_count": 381,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 381
                      }
                    ]
                  }
                },
                {
                  "key": "siem.queryRule",
                  "doc_count": 325,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 325
                      }
                    ]
                  }
                },
                {
                  "key": "siem.mlRule",
                  "doc_count": 47,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 47
                      }
                    ]
                  }
                },
                {
                  "key": "siem.thresholdRule",
                  "doc_count": 18,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 18
                      }
                    ]
                  }
                },
                {
                  "key": "siem.newTermsRule",
                  "doc_count": 4,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 4
                      }
                    ]
                  }
                },
                {
                  "key": "siem.indicatorRule",
                  "doc_count": 2,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 2
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByOrigin": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "true",
                  "doc_count": 776,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 776
                      }
                    ]
                  }
                },
                {
                  "key": "false",
                  "doc_count": 1,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 1
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByEnabled": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": 1,
                  "key_as_string": "true",
                  "doc_count": 777
                }
              ]
            }
          }
        }
      },
      "eventLog": {
        "request": {
          "aggs": {
            "totalExecutions": {
              "cardinality": {
                "field": "kibana.alert.rule.execution.uuid"
              }
            },
            "executeEvents": {
              "filter": {
                "term": {
                  "event.action": "execute"
                }
              },
              "aggs": {
                "executionDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_run_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                },
                "scheduleDelayNs": {
                  "percentiles": {
                    "field": "kibana.task.schedule_delay",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                }
              }
            },
            "statusChangeEvents": {
              "filter": {
                "bool": {
                  "filter": [
                    {
                      "term": {
                        "event.action": "status-change"
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "terms": {
                        "kibana.alert.rule.execution.status": ["running", "going to run"]
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "executionsByStatus": {
                  "terms": {
                    "field": "kibana.alert.rule.execution.status"
                  }
                }
              }
            },
            "executionMetricsEvents": {
              "filter": {
                "term": {
                  "event.action": "execution-metrics"
                }
              },
              "aggs": {
                "gaps": {
                  "filter": {
                    "exists": {
                      "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                    }
                  },
                  "aggs": {
                    "totalGapDurationS": {
                      "sum": {
                        "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                      }
                    }
                  }
                },
                "searchDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_search_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                },
                "indexingDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_indexing_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                }
              }
            },
            "messageContainingEvents": {
              "filter": {
                "terms": {
                  "event.action": ["status-change", "message"]
                }
              },
              "aggs": {
                "messagesByLogLevel": {
                  "terms": {
                    "field": "log.level"
                  }
                },
                "errors": {
                  "filter": {
                    "term": {
                      "log.level": "error"
                    }
                  },
                  "aggs": {
                    "topErrors": {
                      "categorize_text": {
                        "field": "message",
                        "size": 5,
                        "similarity_threshold": 99
                      }
                    }
                  }
                },
                "warnings": {
                  "filter": {
                    "term": {
                      "log.level": "warn"
                    }
                  },
                  "aggs": {
                    "topWarnings": {
                      "categorize_text": {
                        "field": "message",
                        "size": 5,
                        "similarity_threshold": 99
                      }
                    }
                  }
                }
              }
            },
            "statsHistory": {
              "date_histogram": {
                "field": "@timestamp",
                "calendar_interval": "hour"
              },
              "aggs": {
                "totalExecutions": {
                  "cardinality": {
                    "field": "kibana.alert.rule.execution.uuid"
                  }
                },
                "executeEvents": {
                  "filter": {
                    "term": {
                      "event.action": "execute"
                    }
                  },
                  "aggs": {
                    "executionDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_run_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    },
                    "scheduleDelayNs": {
                      "percentiles": {
                        "field": "kibana.task.schedule_delay",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    }
                  }
                },
                "statusChangeEvents": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "event.action": "status-change"
                          }
                        }
                      ],
                      "must_not": [
                        {
                          "terms": {
                            "kibana.alert.rule.execution.status": ["running", "going to run"]
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "executionsByStatus": {
                      "terms": {
                        "field": "kibana.alert.rule.execution.status"
                      }
                    }
                  }
                },
                "executionMetricsEvents": {
                  "filter": {
                    "term": {
                      "event.action": "execution-metrics"
                    }
                  },
                  "aggs": {
                    "gaps": {
                      "filter": {
                        "exists": {
                          "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                        }
                      },
                      "aggs": {
                        "totalGapDurationS": {
                          "sum": {
                            "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                          }
                        }
                      }
                    },
                    "searchDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_search_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    },
                    "indexingDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_indexing_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    }
                  }
                },
                "messageContainingEvents": {
                  "filter": {
                    "terms": {
                      "event.action": ["status-change", "message"]
                    }
                  },
                  "aggs": {
                    "messagesByLogLevel": {
                      "terms": {
                        "field": "log.level"
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "response": {
          "aggregations": {
            "statsHistory": {
              "buckets": [
                {
                  "key_as_string": "2023-05-26T15:00:00.000Z",
                  "key": 1685113200000,
                  "doc_count": 11388,
                  "statusChangeEvents": {
                    "doc_count": 2751,
                    "executionsByStatus": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "failed",
                          "doc_count": 1336
                        },
                        {
                          "key": "partial failure",
                          "doc_count": 849
                        },
                        {
                          "key": "succeeded",
                          "doc_count": 566
                        }
                      ]
                    }
                  },
                  "totalExecutions": {
                    "value": 2245
                  },
                  "messageContainingEvents": {
                    "doc_count": 4996,
                    "messagesByLogLevel": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "info",
                          "doc_count": 2811
                        },
                        {
                          "key": "error",
                          "doc_count": 1336
                        },
                        {
                          "key": "warn",
                          "doc_count": 849
                        }
                      ]
                    }
                  },
                  "executeEvents": {
                    "doc_count": 2245,
                    "scheduleDelayNs": {
                      "values": {
                        "1.0": 256000000,
                        "5.0": 3086972222.222222,
                        "25.0": 3133000000,
                        "50.0": 6126000000,
                        "75.0": 59484250000,
                        "95.0": 179817250000,
                        "99.0": 202613000000
                      }
                    },
                    "executionDurationMs": {
                      "values": {
                        "1.0": 280.6,
                        "5.0": 327.7,
                        "25.0": 371.5208333333333,
                        "50.0": 415.6190476190476,
                        "75.0": 505.575,
                        "95.0": 740.4375,
                        "99.0": 1446.1500000000005
                      }
                    }
                  },
                  "executionMetricsEvents": {
                    "doc_count": 1902,
                    "searchDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 8,
                        "95.0": 25,
                        "99.0": 46
                      }
                    },
                    "gaps": {
                      "doc_count": 777,
                      "totalGapDurationS": {
                        "value": 514415894
                      }
                    },
                    "indexingDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 0,
                        "95.0": 0,
                        "99.0": 0
                      }
                    }
                  }
                },
                {
                  "key_as_string": "2023-05-26T16:00:00.000Z",
                  "key": 1685116800000,
                  "doc_count": 28325,
                  "statusChangeEvents": {
                    "doc_count": 6126,
                    "executionsByStatus": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "succeeded",
                          "doc_count": 2390
                        },
                        {
                          "key": "partial failure",
                          "doc_count": 2305
                        },
                        {
                          "key": "failed",
                          "doc_count": 1431
                        }
                      ]
                    }
                  },
                  "totalExecutions": {
                    "value": 6170
                  },
                  "messageContainingEvents": {
                    "doc_count": 12252,
                    "messagesByLogLevel": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "info",
                          "doc_count": 8516
                        },
                        {
                          "key": "warn",
                          "doc_count": 2305
                        },
                        {
                          "key": "error",
                          "doc_count": 1431
                        }
                      ]
                    }
                  },
                  "executeEvents": {
                    "doc_count": 6126,
                    "scheduleDelayNs": {
                      "values": {
                        "1.0": 193000000,
                        "5.0": 3017785185.1851854,
                        "25.0": 3086000000,
                        "50.0": 3105877192.982456,
                        "75.0": 3134645161.290323,
                        "95.0": 6081772222.222222,
                        "99.0": 6122000000
                      }
                    },
                    "executionDurationMs": {
                      "values": {
                        "1.0": 275.17333333333335,
                        "5.0": 324.8014285714285,
                        "25.0": 377.0752688172043,
                        "50.0": 431,
                        "75.0": 532.3870967741935,
                        "95.0": 720.6761904761904,
                        "99.0": 922.6799999999985
                      }
                    }
                  },
                  "executionMetricsEvents": {
                    "doc_count": 3821,
                    "searchDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 9.8,
                        "75.0": 18,
                        "95.0": 40.17499999999999,
                        "99.0": 124
                      }
                    },
                    "gaps": {
                      "doc_count": 0,
                      "totalGapDurationS": {
                        "value": 0
                      }
                    },
                    "indexingDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 0,
                        "95.0": 0,
                        "99.0": 0
                      }
                    }
                  }
                }
              ]
            },
            "statusChangeEvents": {
              "doc_count": 8877,
              "executionsByStatus": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "partial failure",
                    "doc_count": 3154
                  },
                  {
                    "key": "succeeded",
                    "doc_count": 2956
                  },
                  {
                    "key": "failed",
                    "doc_count": 2767
                  }
                ]
              }
            },
            "totalExecutions": {
              "value": 8455
            },
            "messageContainingEvents": {
              "doc_count": 17248,
              "messagesByLogLevel": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "info",
                    "doc_count": 11327
                  },
                  {
                    "key": "warn",
                    "doc_count": 3154
                  },
                  {
                    "key": "error",
                    "doc_count": 2767
                  }
                ]
              },
              "warnings": {
                "doc_count": 3154,
                "topWarnings": {
                  "buckets": [
                    {
                      "doc_count": 3154,
                      "key": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching was found This warning will continue to appear until matching index is created or this rule is disabled",
                      "regex": ".*?This.+?rule.+?is.+?attempting.+?to.+?query.+?data.+?from.+?Elasticsearch.+?indices.+?listed.+?in.+?the.+?Index.+?pattern.+?section.+?of.+?the.+?rule.+?definition.+?however.+?no.+?index.+?matching.+?was.+?found.+?This.+?warning.+?will.+?continue.+?to.+?appear.+?until.+?matching.+?index.+?is.+?created.+?or.+?this.+?rule.+?is.+?disabled.*?",
                      "max_matching_length": 342
                    }
                  ]
                }
              },
              "errors": {
                "doc_count": 2767,
                "topErrors": {
                  "buckets": [
                    {
                      "doc_count": 1802,
                      "key": "An error occurred during rule execution message verification_exception",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?verification_exception.*?",
                      "max_matching_length": 2064
                    },
                    {
                      "doc_count": 777,
                      "key": "were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances",
                      "regex": ".*?were.+?not.+?queried.+?between.+?this.+?rule.+?execution.+?and.+?the.+?last.+?execution.+?so.+?signals.+?may.+?have.+?been.+?missed.+?Consider.+?increasing.+?your.+?look.+?behind.+?time.+?or.+?adding.+?more.+?Kibana.+?instances.*?",
                      "max_matching_length": 216
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message rare_error_code missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?rare_error_code.+?missing.*?",
                      "max_matching_length": 82
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message v3_windows_anomalous_path_activity missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?v3_windows_anomalous_path_activity.+?missing.*?",
                      "max_matching_length": 103
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message v3_windows_rare_user_type10_remote_login missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?v3_windows_rare_user_type10_remote_login.+?missing.*?",
                      "max_matching_length": 110
                    }
                  ]
                }
              }
            },
            "executeEvents": {
              "doc_count": 8371,
              "scheduleDelayNs": {
                "values": {
                  "1.0": 206000000,
                  "5.0": 3027000000,
                  "25.0": 3092000000,
                  "50.0": 3116000000,
                  "75.0": 3278666666.6666665,
                  "95.0": 99656950000,
                  "99.0": 186632790000
                }
              },
              "executionDurationMs": {
                "values": {
                  "1.0": 275.5325,
                  "5.0": 326.07857142857137,
                  "25.0": 375.68969144460027,
                  "50.0": 427,
                  "75.0": 526.2948717948718,
                  "95.0": 727.2480952380952,
                  "99.0": 1009.5299999999934
                }
              }
            },
            "executionMetricsEvents": {
              "doc_count": 5723,
              "searchDurationMs": {
                "values": {
                  "1.0": 0,
                  "5.0": 0,
                  "25.0": 0,
                  "50.0": 4,
                  "75.0": 16,
                  "95.0": 34.43846153846145,
                  "99.0": 116.51333333333302
                }
              },
              "gaps": {
                "doc_count": 777,
                "totalGapDurationS": {
                  "value": 514415894
                }
              },
              "indexingDurationMs": {
                "values": {
                  "1.0": 0,
                  "5.0": 0,
                  "25.0": 0,
                  "50.0": 0,
                  "75.0": 0,
                  "95.0": 0,
                  "99.0": 0
                }
              }
            }
          }
        }
      }
    }
  }
}
```
</p>
</details>

## Other notes

I'm thinking about backporting it to `8.8` so it could become available
in `8.8.1+` clusters.

In the next episodes:

- Add support for it to the
[support-diagnostics](https:/elastic/support-diagnostics)
tool so its output could be available in the diagnostic dumps.
- Implement the cluster health endpoint.
- Calculate more metrics for the rule health endpoint.
- Calculate more metrics for the space health endpoint.
- Etc - see the to-do list in the epic.

### Checklist

Delete any items that are not applicable to this PR.

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- See dev docs added in
`x-pack/plugins/security_solution/common/detection_engine/rule_monitoring/api/detection_engine_health/README.md`
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 8ac83df6202759d38b04a781d8739c0ce3e0986e)

# Conflicts:
#	x-pack/plugins/security_solution/server/plugin.ts
banderror added a commit to banderror/kibana that referenced this issue Jun 14, 2023
…57155)

**Partially addresses:** https:/elastic/kibana/issues/125642

## Summary

This PR introduces a PoC health API that allows users to get health
overview of the Detection Engine across the whole cluster, or within a
given Kibana space, or for a given rule. It can be useful for
troubleshooting issues with cluster provisioning/scaling, issues with
certain rules failing or generating too much load on the cluster,
identifying common rule execution errors, etc.

In the future, this API might become helpful for building more Rule
Monitoring UIs giving our users more clarity and transparency about the
work of the Detection Engine.

## Rule health endpoint

🚧 NOTE: this endpoint is **partially implemented**. 🚧

```txt
POST /internal/detection_engine/health/_rule
```

Get health overview of a rule. Scope: a given detection rule in the
current Kibana space.
Returns:

- health stats at the moment of the API call (rule and its execution
summary)
- health stats over a specified period of time ("health interval")
- health stats history within the same interval in the form of a
histogram
(the same stats are calculated over each of the discreet sub-intervals
of the whole interval)

Minimal required parameters:

```json
{
  "rule_id": "d4beff10-f045-11ed-89d8-3b6931af10bc"
}
```

<details><summary>Response:</summary>
<p>

```json
{
  "timings": {
    "requested_at": "2023-05-26T16:09:54.128Z",
    "processed_at": "2023-05-26T16:09:54.778Z",
    "processing_time_ms": 650
  },
  "parameters": {
    "interval": {
      "type": "last_day",
      "granularity": "hour",
      "from": "2023-05-25T16:09:54.128Z",
      "to": "2023-05-26T16:09:54.128Z",
      "duration": "PT24H"
    },
    "rule_id": "d4beff10-f045-11ed-89d8-3b6931af10bc"
  },
  "health": {
    "stats_at_the_moment": {
      "rule": {
        "id": "d4beff10-f045-11ed-89d8-3b6931af10bc",
        "updated_at": "2023-05-26T15:44:21.689Z",
        "updated_by": "elastic",
        "created_at": "2023-05-11T21:50:23.830Z",
        "created_by": "elastic",
        "name": "Test rule",
        "tags": ["foo"],
        "interval": "1m",
        "enabled": true,
        "revision": 2,
        "description": "-",
        "risk_score": 21,
        "severity": "low",
        "license": "",
        "output_index": "",
        "meta": {
          "from": "6h",
          "kibana_siem_app_url": "http://localhost:5601/kbn/app/security"
        },
        "author": [],
        "false_positives": [],
        "from": "now-21660s",
        "rule_id": "e46eaaf3-6d81-4cdb-8cbb-b2201a11358b",
        "max_signals": 100,
        "risk_score_mapping": [],
        "severity_mapping": [],
        "threat": [],
        "to": "now",
        "references": [],
        "version": 3,
        "exceptions_list": [],
        "immutable": false,
        "related_integrations": [],
        "required_fields": [],
        "setup": "",
        "type": "query",
        "language": "kuery",
        "index": [
          "apm-*-transaction*",
          "auditbeat-*",
          "endgame-*",
          "filebeat-*",
          "logs-*",
          "packetbeat-*",
          "traces-apm*",
          "winlogbeat-*",
          "-*elastic-cloud-logs-*",
          "foo-*"
        ],
        "query": "*",
        "filters": [],
        "actions": [
          {
            "group": "default",
            "id": "bd59c4e0-f045-11ed-89d8-3b6931af10bc",
            "params": {
              "body": "Hello world"
            },
            "action_type_id": ".webhook",
            "uuid": "f8b87eb0-58bb-4d4b-a584-084d44ab847e",
            "frequency": {
              "summary": true,
              "throttle": null,
              "notifyWhen": "onActiveAlert"
            }
          }
        ],
        "execution_summary": {
          "last_execution": {
            "date": "2023-05-26T16:09:36.848Z",
            "status": "succeeded",
            "status_order": 0,
            "message": "Rule execution completed successfully",
            "metrics": {
              "total_search_duration_ms": 2,
              "execution_gap_duration_s": 80395
            }
          }
        }
      }
    },
    "stats_over_interval": {
      "number_of_executions": {
        "total": 21,
        "by_outcome": {
          "succeeded": 20,
          "warning": 0,
          "failed": 1
        }
      },
      "number_of_logged_messages": {
        "total": 42,
        "by_level": {
          "error": 1,
          "warn": 0,
          "info": 41,
          "debug": 0,
          "trace": 0
        }
      },
      "number_of_detected_gaps": {
        "total": 1,
        "total_duration_s": 80395
      },
      "schedule_delay_ms": {
        "percentiles": {
          "1.0": 3061,
          "5.0": 3083,
          "25.0": 3112,
          "50.0": 6049,
          "75.0": 6069.5,
          "95.0": 100093.79999999986,
          "99.0": 207687
        }
      },
      "execution_duration_ms": {
        "percentiles": {
          "1.0": 226,
          "5.0": 228.2,
          "25.0": 355.5,
          "50.0": 422,
          "75.0": 447,
          "95.0": 677.75,
          "99.0": 719
        }
      },
      "search_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 1.1,
          "25.0": 2.75,
          "50.0": 7,
          "75.0": 13.5,
          "95.0": 29.59999999999998,
          "99.0": 45
        }
      },
      "indexing_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 0,
          "95.0": 0,
          "99.0": 0
        }
      },
      "top_errors": [
        {
          "count": 1,
          "message": "day were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances"
        }
      ],
      "top_warnings": []
    },
    "history_over_interval": {
      "buckets": [
        {
          "timestamp": "2023-05-26T15:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 12,
              "by_outcome": {
                "succeeded": 11,
                "warning": 0,
                "failed": 1
              }
            },
            "number_of_logged_messages": {
              "total": 24,
              "by_level": {
                "error": 1,
                "warn": 0,
                "info": 23,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 1,
              "total_duration_s": 80395
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 3106,
                "5.0": 3106.8,
                "25.0": 3124.5,
                "50.0": 6067.5,
                "75.0": 9060.5,
                "95.0": 188124.59999999971,
                "99.0": 207687
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 230,
                "5.0": 236.2,
                "25.0": 354,
                "50.0": 405,
                "75.0": 447.5,
                "95.0": 563.3999999999999,
                "99.0": 576
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0.20000000000000018,
                "25.0": 2.5,
                "50.0": 5,
                "75.0": 14,
                "95.0": 42.19999999999996,
                "99.0": 45
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        },
        {
          "timestamp": "2023-05-26T16:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 9,
              "by_outcome": {
                "succeeded": 9,
                "warning": 0,
                "failed": 0
              }
            },
            "number_of_logged_messages": {
              "total": 18,
              "by_level": {
                "error": 0,
                "warn": 0,
                "info": 18,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 0,
              "total_duration_s": 0
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 3061,
                "5.0": 3061,
                "25.0": 3104.75,
                "50.0": 3115,
                "75.0": 6053,
                "95.0": 6068,
                "99.0": 6068
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 226.00000000000003,
                "5.0": 226,
                "25.0": 356,
                "50.0": 436,
                "75.0": 495.5,
                "95.0": 719,
                "99.0": 719
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 2,
                "5.0": 2,
                "25.0": 5.75,
                "50.0": 8,
                "75.0": 13.75,
                "95.0": 17,
                "99.0": 17
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        }
      ]
    }
  }
}
```
</p>
</details>

## Space health endpoint

🚧 NOTE: this endpoint is **partially implemented**. 🚧

```txt
POST /internal/detection_engine/health/_space
```

Get health overview of the current Kibana space. Scope: all detection
rules in the space.
Returns:

- health stats at the moment of the API call
- health stats over a specified period of time ("health interval")
- health stats history within the same interval in the form of a
histogram
(the same stats are calculated over each of the discreet sub-intervals
of the whole interval)

Minimal required parameters: empty object.

```json
{}
```

<details><summary>Response:</summary>
<p>

```json
{
  "timings": {
    "requested_at": "2023-05-26T16:24:21.628Z",
    "processed_at": "2023-05-26T16:24:22.880Z",
    "processing_time_ms": 1252
  },
  "parameters": {
    "interval": {
      "type": "last_day",
      "granularity": "hour",
      "from": "2023-05-25T16:24:21.628Z",
      "to": "2023-05-26T16:24:21.628Z",
      "duration": "PT24H"
    }
  },
  "health": {
    "stats_at_the_moment": {
      "number_of_rules": {
        "all": {
          "total": 777,
          "enabled": 777,
          "disabled": 0
        },
        "by_origin": {
          "prebuilt": {
            "total": 776,
            "enabled": 776,
            "disabled": 0
          },
          "custom": {
            "total": 1,
            "enabled": 1,
            "disabled": 0
          }
        },
        "by_type": {
          "siem.eqlRule": {
            "total": 381,
            "enabled": 381,
            "disabled": 0
          },
          "siem.queryRule": {
            "total": 325,
            "enabled": 325,
            "disabled": 0
          },
          "siem.mlRule": {
            "total": 47,
            "enabled": 47,
            "disabled": 0
          },
          "siem.thresholdRule": {
            "total": 18,
            "enabled": 18,
            "disabled": 0
          },
          "siem.newTermsRule": {
            "total": 4,
            "enabled": 4,
            "disabled": 0
          },
          "siem.indicatorRule": {
            "total": 2,
            "enabled": 2,
            "disabled": 0
          }
        },
        "by_outcome": {
          "warning": {
            "total": 307,
            "enabled": 307,
            "disabled": 0
          },
          "succeeded": {
            "total": 266,
            "enabled": 266,
            "disabled": 0
          },
          "failed": {
            "total": 204,
            "enabled": 204,
            "disabled": 0
          }
        }
      }
    },
    "stats_over_interval": {
      "number_of_executions": {
        "total": 5622,
        "by_outcome": {
          "succeeded": 1882,
          "warning": 2129,
          "failed": 2120
        }
      },
      "number_of_logged_messages": {
        "total": 11756,
        "by_level": {
          "error": 2120,
          "warn": 2129,
          "info": 7507,
          "debug": 0,
          "trace": 0
        }
      },
      "number_of_detected_gaps": {
        "total": 777,
        "total_duration_s": 514415894
      },
      "schedule_delay_ms": {
        "percentiles": {
          "1.0": 216,
          "5.0": 3048.5,
          "25.0": 3105,
          "50.0": 3129,
          "75.0": 6112.355119825708,
          "95.0": 134006,
          "99.0": 195578
        }
      },
      "execution_duration_ms": {
        "percentiles": {
          "1.0": 275,
          "5.0": 323.375,
          "25.0": 370.80555555555554,
          "50.0": 413.1122337092731,
          "75.0": 502.25233127864715,
          "95.0": 685.8055555555555,
          "99.0": 1194.75
        }
      },
      "search_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 15,
          "95.0": 30,
          "99.0": 99.44000000000005
        }
      },
      "indexing_duration_ms": {
        "percentiles": {
          "1.0": 0,
          "5.0": 0,
          "25.0": 0,
          "50.0": 0,
          "75.0": 0,
          "95.0": 0,
          "99.0": 0
        }
      },
      "top_errors": [
        {
          "count": 1202,
          "message": "An error occurred during rule execution message verification_exception"
        },
        {
          "count": 777,
          "message": "were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message rare_error_code missing"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message v3_windows_anomalous_path_activity missing"
        },
        {
          "count": 3,
          "message": "An error occurred during rule execution message v3_windows_rare_user_type10_remote_login missing"
        }
      ],
      "top_warnings": [
        {
          "count": 2129,
          "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching was found This warning will continue to appear until matching index is created or this rule is disabled"
        }
      ]
    },
    "history_over_interval": {
      "buckets": [
        {
          "timestamp": "2023-05-26T15:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 2245,
              "by_outcome": {
                "succeeded": 566,
                "warning": 849,
                "failed": 1336
              }
            },
            "number_of_logged_messages": {
              "total": 4996,
              "by_level": {
                "error": 1336,
                "warn": 849,
                "info": 2811,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 777,
              "total_duration_s": 514415894
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 256,
                "5.0": 3086.9722222222217,
                "25.0": 3133,
                "50.0": 6126,
                "75.0": 59484.25,
                "95.0": 179817.25,
                "99.0": 202613
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 280.6,
                "5.0": 327.7,
                "25.0": 371.5208333333333,
                "50.0": 415.6190476190476,
                "75.0": 505.7642857142857,
                "95.0": 740.4375,
                "99.0": 1446.1500000000005
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 8,
                "95.0": 25,
                "99.0": 46
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        },
        {
          "timestamp": "2023-05-26T16:00:00.000Z",
          "stats": {
            "number_of_executions": {
              "total": 3363,
              "by_outcome": {
                "succeeded": 1316,
                "warning": 1280,
                "failed": 784
              }
            },
            "number_of_logged_messages": {
              "total": 6760,
              "by_level": {
                "error": 784,
                "warn": 1280,
                "info": 4696,
                "debug": 0,
                "trace": 0
              }
            },
            "number_of_detected_gaps": {
              "total": 0,
              "total_duration_s": 0
            },
            "schedule_delay_ms": {
              "percentiles": {
                "1.0": 207,
                "5.0": 3042,
                "25.0": 3098.46511627907,
                "50.0": 3112,
                "75.0": 3145.2820512820517,
                "95.0": 6100.571428571428,
                "99.0": 6123
              }
            },
            "execution_duration_ms": {
              "percentiles": {
                "1.0": 275,
                "5.0": 319.85714285714283,
                "25.0": 370.0357142857143,
                "50.0": 410.79999229108853,
                "75.0": 500.7692307692308,
                "95.0": 675,
                "99.0": 781.3999999999996
              }
            },
            "search_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 9,
                "75.0": 17.555555555555557,
                "95.0": 34,
                "99.0": 110.5
              }
            },
            "indexing_duration_ms": {
              "percentiles": {
                "1.0": 0,
                "5.0": 0,
                "25.0": 0,
                "50.0": 0,
                "75.0": 0,
                "95.0": 0,
                "99.0": 0
              }
            }
          }
        }
      ]
    }
  }
}
```
</p>
</details>

## Cluster health endpoint

🚧 NOTE: this endpoint is **not implemented**. 🚧

```txt
POST /internal/detection_engine/health/_cluster
```

Minimal required parameters: empty object.

```json
{}
```

<details><summary>Response:</summary>
<p>

```json
{
  "message": "Not implemented",
  "timings": {
    "requested_at": "2023-05-26T16:32:01.878Z",
    "processed_at": "2023-05-26T16:32:01.881Z",
    "processing_time_ms": 3
  },
  "parameters": {
    "interval": {
      "type": "last_week",
      "granularity": "hour",
      "from": "2023-05-19T16:32:01.878Z",
      "to": "2023-05-26T16:32:01.878Z",
      "duration": "PT168H"
    }
  },
  "health": {
    "stats_at_the_moment": {
      "number_of_rules": {
        "all": {
          "total": 0,
          "enabled": 0,
          "disabled": 0
        },
        "by_origin": {
          "prebuilt": {
            "total": 0,
            "enabled": 0,
            "disabled": 0
          },
          "custom": {
            "total": 0,
            "enabled": 0,
            "disabled": 0
          }
        },
        "by_type": {},
        "by_outcome": {}
      }
    },
    "stats_over_interval": {
      "message": "Not implemented"
    },
    "history_over_interval": {
      "buckets": []
    }
  }
}
```
</p>
</details>

## Optional parameters

All the three endpoints accept optional `interval` and `debug` request
parameters.

### Health interval

You can change the interval over which the health stats will be
calculated. If you don't specify it, by default health stats will be
calculated over the last day with the granularity of 1 hour.

```json
{
  "interval": {
    "type": "last_week",
    "granularity": "day"
  }
}
```

You can also specify a custom date range with exact interval bounds.

```json
{
  "interval": {
    "type": "custom_range",
    "granularity": "minute",
    "from": "2023-05-20T16:24:21.628Z",
    "to": "2023-05-26T16:24:21.628Z"
  }
}
```

Please keep in mind that requesting large intervals with small
granularity can generate substantial load on the system and enormous API
responses.

### Debug mode

You can also include various debug information in the response, such as
queries and aggregations sent to Elasticsearch and response received
from it.

```json
{
  "debug": true
}
```

In the response you will find something like that:

<details><summary>Response:</summary>
<p>

```json
{
  "health": {
    "debug": {
      "rulesClient": {
        "request": {
          "aggs": {
            "rulesByEnabled": {
              "terms": {
                "field": "alert.attributes.enabled"
              }
            },
            "rulesByOrigin": {
              "terms": {
                "field": "alert.attributes.params.immutable"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            },
            "rulesByType": {
              "terms": {
                "field": "alert.attributes.alertTypeId"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            },
            "rulesByOutcome": {
              "terms": {
                "field": "alert.attributes.lastRun.outcome"
              },
              "aggs": {
                "rulesByEnabled": {
                  "terms": {
                    "field": "alert.attributes.enabled"
                  }
                }
              }
            }
          }
        },
        "response": {
          "aggregations": {
            "rulesByOutcome": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "warning",
                  "doc_count": 307,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 307
                      }
                    ]
                  }
                },
                {
                  "key": "succeeded",
                  "doc_count": 266,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 266
                      }
                    ]
                  }
                },
                {
                  "key": "failed",
                  "doc_count": 204,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 204
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByType": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "siem.eqlRule",
                  "doc_count": 381,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 381
                      }
                    ]
                  }
                },
                {
                  "key": "siem.queryRule",
                  "doc_count": 325,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 325
                      }
                    ]
                  }
                },
                {
                  "key": "siem.mlRule",
                  "doc_count": 47,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 47
                      }
                    ]
                  }
                },
                {
                  "key": "siem.thresholdRule",
                  "doc_count": 18,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 18
                      }
                    ]
                  }
                },
                {
                  "key": "siem.newTermsRule",
                  "doc_count": 4,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 4
                      }
                    ]
                  }
                },
                {
                  "key": "siem.indicatorRule",
                  "doc_count": 2,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 2
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByOrigin": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "true",
                  "doc_count": 776,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 776
                      }
                    ]
                  }
                },
                {
                  "key": "false",
                  "doc_count": 1,
                  "rulesByEnabled": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": 1,
                        "key_as_string": "true",
                        "doc_count": 1
                      }
                    ]
                  }
                }
              ]
            },
            "rulesByEnabled": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": 1,
                  "key_as_string": "true",
                  "doc_count": 777
                }
              ]
            }
          }
        }
      },
      "eventLog": {
        "request": {
          "aggs": {
            "totalExecutions": {
              "cardinality": {
                "field": "kibana.alert.rule.execution.uuid"
              }
            },
            "executeEvents": {
              "filter": {
                "term": {
                  "event.action": "execute"
                }
              },
              "aggs": {
                "executionDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_run_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                },
                "scheduleDelayNs": {
                  "percentiles": {
                    "field": "kibana.task.schedule_delay",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                }
              }
            },
            "statusChangeEvents": {
              "filter": {
                "bool": {
                  "filter": [
                    {
                      "term": {
                        "event.action": "status-change"
                      }
                    }
                  ],
                  "must_not": [
                    {
                      "terms": {
                        "kibana.alert.rule.execution.status": ["running", "going to run"]
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "executionsByStatus": {
                  "terms": {
                    "field": "kibana.alert.rule.execution.status"
                  }
                }
              }
            },
            "executionMetricsEvents": {
              "filter": {
                "term": {
                  "event.action": "execution-metrics"
                }
              },
              "aggs": {
                "gaps": {
                  "filter": {
                    "exists": {
                      "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                    }
                  },
                  "aggs": {
                    "totalGapDurationS": {
                      "sum": {
                        "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                      }
                    }
                  }
                },
                "searchDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_search_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                },
                "indexingDurationMs": {
                  "percentiles": {
                    "field": "kibana.alert.rule.execution.metrics.total_indexing_duration_ms",
                    "missing": 0,
                    "percents": [1, 5, 25, 50, 75, 95, 99]
                  }
                }
              }
            },
            "messageContainingEvents": {
              "filter": {
                "terms": {
                  "event.action": ["status-change", "message"]
                }
              },
              "aggs": {
                "messagesByLogLevel": {
                  "terms": {
                    "field": "log.level"
                  }
                },
                "errors": {
                  "filter": {
                    "term": {
                      "log.level": "error"
                    }
                  },
                  "aggs": {
                    "topErrors": {
                      "categorize_text": {
                        "field": "message",
                        "size": 5,
                        "similarity_threshold": 99
                      }
                    }
                  }
                },
                "warnings": {
                  "filter": {
                    "term": {
                      "log.level": "warn"
                    }
                  },
                  "aggs": {
                    "topWarnings": {
                      "categorize_text": {
                        "field": "message",
                        "size": 5,
                        "similarity_threshold": 99
                      }
                    }
                  }
                }
              }
            },
            "statsHistory": {
              "date_histogram": {
                "field": "@timestamp",
                "calendar_interval": "hour"
              },
              "aggs": {
                "totalExecutions": {
                  "cardinality": {
                    "field": "kibana.alert.rule.execution.uuid"
                  }
                },
                "executeEvents": {
                  "filter": {
                    "term": {
                      "event.action": "execute"
                    }
                  },
                  "aggs": {
                    "executionDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_run_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    },
                    "scheduleDelayNs": {
                      "percentiles": {
                        "field": "kibana.task.schedule_delay",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    }
                  }
                },
                "statusChangeEvents": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "event.action": "status-change"
                          }
                        }
                      ],
                      "must_not": [
                        {
                          "terms": {
                            "kibana.alert.rule.execution.status": ["running", "going to run"]
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "executionsByStatus": {
                      "terms": {
                        "field": "kibana.alert.rule.execution.status"
                      }
                    }
                  }
                },
                "executionMetricsEvents": {
                  "filter": {
                    "term": {
                      "event.action": "execution-metrics"
                    }
                  },
                  "aggs": {
                    "gaps": {
                      "filter": {
                        "exists": {
                          "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                        }
                      },
                      "aggs": {
                        "totalGapDurationS": {
                          "sum": {
                            "field": "kibana.alert.rule.execution.metrics.execution_gap_duration_s"
                          }
                        }
                      }
                    },
                    "searchDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_search_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    },
                    "indexingDurationMs": {
                      "percentiles": {
                        "field": "kibana.alert.rule.execution.metrics.total_indexing_duration_ms",
                        "missing": 0,
                        "percents": [1, 5, 25, 50, 75, 95, 99]
                      }
                    }
                  }
                },
                "messageContainingEvents": {
                  "filter": {
                    "terms": {
                      "event.action": ["status-change", "message"]
                    }
                  },
                  "aggs": {
                    "messagesByLogLevel": {
                      "terms": {
                        "field": "log.level"
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "response": {
          "aggregations": {
            "statsHistory": {
              "buckets": [
                {
                  "key_as_string": "2023-05-26T15:00:00.000Z",
                  "key": 1685113200000,
                  "doc_count": 11388,
                  "statusChangeEvents": {
                    "doc_count": 2751,
                    "executionsByStatus": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "failed",
                          "doc_count": 1336
                        },
                        {
                          "key": "partial failure",
                          "doc_count": 849
                        },
                        {
                          "key": "succeeded",
                          "doc_count": 566
                        }
                      ]
                    }
                  },
                  "totalExecutions": {
                    "value": 2245
                  },
                  "messageContainingEvents": {
                    "doc_count": 4996,
                    "messagesByLogLevel": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "info",
                          "doc_count": 2811
                        },
                        {
                          "key": "error",
                          "doc_count": 1336
                        },
                        {
                          "key": "warn",
                          "doc_count": 849
                        }
                      ]
                    }
                  },
                  "executeEvents": {
                    "doc_count": 2245,
                    "scheduleDelayNs": {
                      "values": {
                        "1.0": 256000000,
                        "5.0": 3086972222.222222,
                        "25.0": 3133000000,
                        "50.0": 6126000000,
                        "75.0": 59484250000,
                        "95.0": 179817250000,
                        "99.0": 202613000000
                      }
                    },
                    "executionDurationMs": {
                      "values": {
                        "1.0": 280.6,
                        "5.0": 327.7,
                        "25.0": 371.5208333333333,
                        "50.0": 415.6190476190476,
                        "75.0": 505.575,
                        "95.0": 740.4375,
                        "99.0": 1446.1500000000005
                      }
                    }
                  },
                  "executionMetricsEvents": {
                    "doc_count": 1902,
                    "searchDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 8,
                        "95.0": 25,
                        "99.0": 46
                      }
                    },
                    "gaps": {
                      "doc_count": 777,
                      "totalGapDurationS": {
                        "value": 514415894
                      }
                    },
                    "indexingDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 0,
                        "95.0": 0,
                        "99.0": 0
                      }
                    }
                  }
                },
                {
                  "key_as_string": "2023-05-26T16:00:00.000Z",
                  "key": 1685116800000,
                  "doc_count": 28325,
                  "statusChangeEvents": {
                    "doc_count": 6126,
                    "executionsByStatus": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "succeeded",
                          "doc_count": 2390
                        },
                        {
                          "key": "partial failure",
                          "doc_count": 2305
                        },
                        {
                          "key": "failed",
                          "doc_count": 1431
                        }
                      ]
                    }
                  },
                  "totalExecutions": {
                    "value": 6170
                  },
                  "messageContainingEvents": {
                    "doc_count": 12252,
                    "messagesByLogLevel": {
                      "doc_count_error_upper_bound": 0,
                      "sum_other_doc_count": 0,
                      "buckets": [
                        {
                          "key": "info",
                          "doc_count": 8516
                        },
                        {
                          "key": "warn",
                          "doc_count": 2305
                        },
                        {
                          "key": "error",
                          "doc_count": 1431
                        }
                      ]
                    }
                  },
                  "executeEvents": {
                    "doc_count": 6126,
                    "scheduleDelayNs": {
                      "values": {
                        "1.0": 193000000,
                        "5.0": 3017785185.1851854,
                        "25.0": 3086000000,
                        "50.0": 3105877192.982456,
                        "75.0": 3134645161.290323,
                        "95.0": 6081772222.222222,
                        "99.0": 6122000000
                      }
                    },
                    "executionDurationMs": {
                      "values": {
                        "1.0": 275.17333333333335,
                        "5.0": 324.8014285714285,
                        "25.0": 377.0752688172043,
                        "50.0": 431,
                        "75.0": 532.3870967741935,
                        "95.0": 720.6761904761904,
                        "99.0": 922.6799999999985
                      }
                    }
                  },
                  "executionMetricsEvents": {
                    "doc_count": 3821,
                    "searchDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 9.8,
                        "75.0": 18,
                        "95.0": 40.17499999999999,
                        "99.0": 124
                      }
                    },
                    "gaps": {
                      "doc_count": 0,
                      "totalGapDurationS": {
                        "value": 0
                      }
                    },
                    "indexingDurationMs": {
                      "values": {
                        "1.0": 0,
                        "5.0": 0,
                        "25.0": 0,
                        "50.0": 0,
                        "75.0": 0,
                        "95.0": 0,
                        "99.0": 0
                      }
                    }
                  }
                }
              ]
            },
            "statusChangeEvents": {
              "doc_count": 8877,
              "executionsByStatus": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "partial failure",
                    "doc_count": 3154
                  },
                  {
                    "key": "succeeded",
                    "doc_count": 2956
                  },
                  {
                    "key": "failed",
                    "doc_count": 2767
                  }
                ]
              }
            },
            "totalExecutions": {
              "value": 8455
            },
            "messageContainingEvents": {
              "doc_count": 17248,
              "messagesByLogLevel": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "info",
                    "doc_count": 11327
                  },
                  {
                    "key": "warn",
                    "doc_count": 3154
                  },
                  {
                    "key": "error",
                    "doc_count": 2767
                  }
                ]
              },
              "warnings": {
                "doc_count": 3154,
                "topWarnings": {
                  "buckets": [
                    {
                      "doc_count": 3154,
                      "key": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching was found This warning will continue to appear until matching index is created or this rule is disabled",
                      "regex": ".*?This.+?rule.+?is.+?attempting.+?to.+?query.+?data.+?from.+?Elasticsearch.+?indices.+?listed.+?in.+?the.+?Index.+?pattern.+?section.+?of.+?the.+?rule.+?definition.+?however.+?no.+?index.+?matching.+?was.+?found.+?This.+?warning.+?will.+?continue.+?to.+?appear.+?until.+?matching.+?index.+?is.+?created.+?or.+?this.+?rule.+?is.+?disabled.*?",
                      "max_matching_length": 342
                    }
                  ]
                }
              },
              "errors": {
                "doc_count": 2767,
                "topErrors": {
                  "buckets": [
                    {
                      "doc_count": 1802,
                      "key": "An error occurred during rule execution message verification_exception",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?verification_exception.*?",
                      "max_matching_length": 2064
                    },
                    {
                      "doc_count": 777,
                      "key": "were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances",
                      "regex": ".*?were.+?not.+?queried.+?between.+?this.+?rule.+?execution.+?and.+?the.+?last.+?execution.+?so.+?signals.+?may.+?have.+?been.+?missed.+?Consider.+?increasing.+?your.+?look.+?behind.+?time.+?or.+?adding.+?more.+?Kibana.+?instances.*?",
                      "max_matching_length": 216
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message rare_error_code missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?rare_error_code.+?missing.*?",
                      "max_matching_length": 82
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message v3_windows_anomalous_path_activity missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?v3_windows_anomalous_path_activity.+?missing.*?",
                      "max_matching_length": 103
                    },
                    {
                      "doc_count": 4,
                      "key": "An error occurred during rule execution message v3_windows_rare_user_type10_remote_login missing",
                      "regex": ".*?An.+?error.+?occurred.+?during.+?rule.+?execution.+?message.+?v3_windows_rare_user_type10_remote_login.+?missing.*?",
                      "max_matching_length": 110
                    }
                  ]
                }
              }
            },
            "executeEvents": {
              "doc_count": 8371,
              "scheduleDelayNs": {
                "values": {
                  "1.0": 206000000,
                  "5.0": 3027000000,
                  "25.0": 3092000000,
                  "50.0": 3116000000,
                  "75.0": 3278666666.6666665,
                  "95.0": 99656950000,
                  "99.0": 186632790000
                }
              },
              "executionDurationMs": {
                "values": {
                  "1.0": 275.5325,
                  "5.0": 326.07857142857137,
                  "25.0": 375.68969144460027,
                  "50.0": 427,
                  "75.0": 526.2948717948718,
                  "95.0": 727.2480952380952,
                  "99.0": 1009.5299999999934
                }
              }
            },
            "executionMetricsEvents": {
              "doc_count": 5723,
              "searchDurationMs": {
                "values": {
                  "1.0": 0,
                  "5.0": 0,
                  "25.0": 0,
                  "50.0": 4,
                  "75.0": 16,
                  "95.0": 34.43846153846145,
                  "99.0": 116.51333333333302
                }
              },
              "gaps": {
                "doc_count": 777,
                "totalGapDurationS": {
                  "value": 514415894
                }
              },
              "indexingDurationMs": {
                "values": {
                  "1.0": 0,
                  "5.0": 0,
                  "25.0": 0,
                  "50.0": 0,
                  "75.0": 0,
                  "95.0": 0,
                  "99.0": 0
                }
              }
            }
          }
        }
      }
    }
  }
}
```
</p>
</details>

## Other notes

I'm thinking about backporting it to `8.8` so it could become available
in `8.8.1+` clusters.

In the next episodes:

- Add support for it to the
[support-diagnostics](https:/elastic/support-diagnostics)
tool so its output could be available in the diagnostic dumps.
- Implement the cluster health endpoint.
- Calculate more metrics for the rule health endpoint.
- Calculate more metrics for the space health endpoint.
- Etc - see the to-do list in the epic.

### Checklist

Delete any items that are not applicable to this PR.

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- See dev docs added in
`x-pack/plugins/security_solution/common/detection_engine/rule_monitoring/api/detection_engine_health/README.md`
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 8ac83df6202759d38b04a781d8739c0ce3e0986e)

# Conflicts:
#	x-pack/plugins/security_solution/server/plugin.ts
banderror added a commit that referenced this issue Jun 20, 2023
…ine health API (#159970)

**Partially addresses:** #125642

## Summary

The PoC of the Detection Engine health API has been implemented in
#157155. Now, we need to integrate
it into the
[support-diagnostics](https:/elastic/support-diagnostics)
tool. It looks like the tool requires the APIs it calls to be callable
with the `GET` verb.

This PR makes it possible to call 2 out of 3 health endpoints with
`GET`:

```txt
GET /internal/detection_engine/health/_cluster
```

```txt
GET /internal/detection_engine/health/_space
```

The `GET` routes don't accept any parameters and use the default
parameters instead:

- interval: `last_day`
- granularity: `hour`
- debug: `false`


### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jun 20, 2023
…ine health API (elastic#159970)

**Partially addresses:** elastic#125642

## Summary

The PoC of the Detection Engine health API has been implemented in
elastic#157155. Now, we need to integrate
it into the
[support-diagnostics](https:/elastic/support-diagnostics)
tool. It looks like the tool requires the APIs it calls to be callable
with the `GET` verb.

This PR makes it possible to call 2 out of 3 health endpoints with
`GET`:

```txt
GET /internal/detection_engine/health/_cluster
```

```txt
GET /internal/detection_engine/health/_space
```

The `GET` routes don't accept any parameters and use the default
parameters instead:

- interval: `last_day`
- granularity: `hour`
- debug: `false`

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

(cherry picked from commit 7047f24)
kibanamachine referenced this issue Jun 20, 2023
…on Engine health API (#159970) (#160023)

# Backport

This will backport the following commits from `main` to `8.8`:
- [[Security Solution] Add support for GET requests to the Detection
Engine health API
(#159970)](#159970)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https:/sqren/backport)

<!--BACKPORT [{"author":{"name":"Georgii
Gorbachev","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-06-20T14:03:27Z","message":"[Security
Solution] Add support for GET requests to the Detection Engine health
API (#159970)\n\n**Partially addresses:**
https:/elastic/kibana/issues/125642\r\n\r\n##
Summary\r\n\r\nThe PoC of the Detection Engine health API has been
implemented in\r\nhttps://pull/157155. Now, we
need to integrate\r\nit into
the\r\n[support-diagnostics](https:/elastic/support-diagnostics)\r\ntool.
It looks like the tool requires the APIs it calls to be callable\r\nwith
the `GET` verb.\r\n\r\nThis PR makes it possible to call 2 out of 3
health endpoints with\r\n`GET`:\r\n\r\n```txt\r\nGET
/internal/detection_engine/health/_cluster\r\n```\r\n\r\n```txt\r\nGET
/internal/detection_engine/health/_space\r\n```\r\n\r\nThe `GET` routes
don't accept any parameters and use the default\r\nparameters
instead:\r\n\r\n- interval: `last_day`\r\n- granularity: `hour`\r\n-
debug: `false`\r\n\r\n\r\n### Checklist\r\n\r\n-
[x]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n\r\n### For
maintainers\r\n\r\n- [x] This was checked for breaking API changes and
was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"7047f24c1743a2a98e22e332403c5260d6062374","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Detections
and Resp","Team: SecuritySolution","Feature:Rule
Monitoring","Team:Detection Rule
Management","v8.9.0","v8.8.2"],"number":159970,"url":"https:/elastic/kibana/pull/159970","mergeCommit":{"message":"[Security
Solution] Add support for GET requests to the Detection Engine health
API (#159970)\n\n**Partially addresses:**
https:/elastic/kibana/issues/125642\r\n\r\n##
Summary\r\n\r\nThe PoC of the Detection Engine health API has been
implemented in\r\nhttps://pull/157155. Now, we
need to integrate\r\nit into
the\r\n[support-diagnostics](https:/elastic/support-diagnostics)\r\ntool.
It looks like the tool requires the APIs it calls to be callable\r\nwith
the `GET` verb.\r\n\r\nThis PR makes it possible to call 2 out of 3
health endpoints with\r\n`GET`:\r\n\r\n```txt\r\nGET
/internal/detection_engine/health/_cluster\r\n```\r\n\r\n```txt\r\nGET
/internal/detection_engine/health/_space\r\n```\r\n\r\nThe `GET` routes
don't accept any parameters and use the default\r\nparameters
instead:\r\n\r\n- interval: `last_day`\r\n- granularity: `hour`\r\n-
debug: `false`\r\n\r\n\r\n### Checklist\r\n\r\n-
[x]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n\r\n### For
maintainers\r\n\r\n- [x] This was checked for breaking API changes and
was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"7047f24c1743a2a98e22e332403c5260d6062374"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https:/elastic/kibana/pull/159970","number":159970,"mergeCommit":{"message":"[Security
Solution] Add support for GET requests to the Detection Engine health
API (#159970)\n\n**Partially addresses:**
https:/elastic/kibana/issues/125642\r\n\r\n##
Summary\r\n\r\nThe PoC of the Detection Engine health API has been
implemented in\r\nhttps://pull/157155. Now, we
need to integrate\r\nit into
the\r\n[support-diagnostics](https:/elastic/support-diagnostics)\r\ntool.
It looks like the tool requires the APIs it calls to be callable\r\nwith
the `GET` verb.\r\n\r\nThis PR makes it possible to call 2 out of 3
health endpoints with\r\n`GET`:\r\n\r\n```txt\r\nGET
/internal/detection_engine/health/_cluster\r\n```\r\n\r\n```txt\r\nGET
/internal/detection_engine/health/_space\r\n```\r\n\r\nThe `GET` routes
don't accept any parameters and use the default\r\nparameters
instead:\r\n\r\n- interval: `last_day`\r\n- granularity: `hour`\r\n-
debug: `false`\r\n\r\n\r\n### Checklist\r\n\r\n-
[x]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n\r\n### For
maintainers\r\n\r\n- [x] This was checked for breaking API changes and
was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"7047f24c1743a2a98e22e332403c5260d6062374"}},{"branch":"8.8","label":"v8.8.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Georgii Gorbachev <[email protected]>
banderror added a commit that referenced this issue Jun 21, 2023
…PI (#160137)

**Partially addresses:** #125642

## Summary

- fixes typos noticed by @maximpn in
#159970 (comment)
- adds additional docs for #159875
@peluja1012
Copy link
Contributor

Hey @banderror, @marshallmain suggested prioritizing the following, in light of recent SDHs.

  1. An API that checks all rule's index patterns and returns the rules that are querying indices with future timestamps, in addition to the actual index names with future timestamps in them
  2. An API that identifies the rules that are consuming the most total execution time - summing execution time over the executions for that rule, so it accounts for rules that are running more often

@banderror
Copy link
Contributor Author

These are great suggestions, thanks @peluja1012 and @marshallmain! I added them to the description.

@marshallmain
Copy link
Contributor

Added one more idea:
top X rules by number of shards queried/shards queried in a particular data tier for identifying potential problem rules/indices

banderror added a commit that referenced this issue Oct 3, 2023
…int (#165602)

**Epic:** #125642

## Summary

This PR implements the `_cluster` health endpoint. It returns the same
metrics as the `_space` health endpoint. The difference is that:

- the `_cluster` health endpoint calculates its metrics on top of data
from all Kibana spaces
- the `_space` health endpoint calculates its metrics on top of data
from the current Kibana space (the one specified in the URL)

Additionally, it fixes a few bugs in the existing health endpoints
related to scoping. This PR ensures that we only aggregate _detection_
rules in the saved objects and the `.kibana-event-log-*` indices, and
not any types of rules.

## RBAC

The `_cluster` health endpoint can be called by any user with at least
Read privilege to Security Solution.

## Documentation

I also updated the health API's README and added a new document
describing what health data we return from what endpoints:

```
security_solution/common/api/detection_engine/rule_monitoring/detection_engine_health/health_data.md
```

### Checklist

Delete any items that are not applicable to this PR.

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
  - [x] Added more info to the dev docs
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.8.2 v8.9.0 v8.11.0
Projects
None yet
Development

No branches or pull requests

6 participants