-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Uptime] Alerts - Monitor status alert - check monitor status by monitor.timespan #104541
[Uptime] Alerts - Monitor status alert - check monitor status by monitor.timespan #104541
Conversation
Pinging @elastic/uptime (Team:uptime) |
d7bb281
to
d813e2e
Compare
@dominiqueclarke just checking, this is the (draft) PR for #103428? |
Yep, getting ready to finalize draft. |
@elasticmachine merge upstream |
…us-by-monitor.timespan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this locally with a monitor, it seemed to function alright for me. The code looks good, too; I had one minor recommendation that you can take or leave.
I would like a little more time to test it some more before we merge.
.parse(timerangeLookback) | ||
?.subtract('24', 'hours') | ||
.valueOf(); | ||
const absoluteFrom = min([scheduleIntervalAbsoluteTime, defaultIntervalAbsoluteTime]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const absoluteFrom = min([scheduleIntervalAbsoluteTime, defaultIntervalAbsoluteTime]); | |
const from = min([scheduleIntervalAbsoluteTime, defaultIntervalAbsoluteTime]) ?? 'now-24h'; |
Just a suggestion, do you think this is cleaner? My thinking is it simplifies the return object so much that you don't really need to look at it.
….timespan' of https:/dominiqueclarke/kibana into fix/99660-uptime-alerts-check-monitor-status-by-monitor.timespan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some additional checking locally and I think this will give us some improvement in the behavior of these alerts.
LGTM
@elasticmachine merge upstream |
…us-by-monitor.timespan
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good !!
Smoke tested and code review !!
…tor.timespan (elastic#104541) * check monitor status by monitor.timespan * Delete mappings.json * adjust logic Co-authored-by: Kibana Machine <[email protected]>
…tor.timespan (elastic#104541) * check monitor status by monitor.timespan * Delete mappings.json * adjust logic Co-authored-by: Kibana Machine <[email protected]>
…-png-pdf-report-type * 'master' of github.com:elastic/kibana: (292 commits) bring back KQL autocomplete in timeline + fix last updated (elastic#105380) [Maps] Change TOC pop-up wording to reflect filter change, not search bar change (elastic#105163) Updating urls to upstream elastic repo (elastic#105250) [Maps] Move new vector layer wizard card down (elastic#104797) Exclude registering the cases feature if not enabled (elastic#105292) [Uptime] Alerts - Monitor status alert - check monitor status by monitor.timespan (elastic#104541) updated UI copy (elastic#105184) Log a warning when documents of unknown types are detected during migration (elastic#105213) [Logs UI] Register log threshold rule as lifecycle rule (elastic#104341) [Ingest pipelines] add network direction processor (elastic#103436) [Console] Autocomplete definitions (manual backport) (elastic#105086) [Security Solution] User can make Exceptions for Memory protection alerts (elastic#102196) [Lens] Formula: add validation for multiple field/metrics (elastic#104092) Removing async from file upload and data visualizer plugins start lifecycle (elastic#105197) Fix error when validating the form with non blocking validations (elastic#103629) [ML] Fix "View by" swim lane with applied filter and sorting by score (elastic#105217) Update dependency @elastic/charts to v32 (elastic#104625) [CTI] shortens large numbers on Dashboard Link Panel (elastic#105269) [Security Solution][Endpoint][Host Isolation] Fixes bug to remove excess host metadata status toasts on non user initiated errors (elastic#105331) [Cases] Fix pushing alerts count on every push to external service (elastic#105030) ... # Conflicts: # x-pack/plugins/reporting/common/types.ts
…tor.timespan (#104541) (#105425) * check monitor status by monitor.timespan * Delete mappings.json * adjust logic Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Dominique Clarke <[email protected]>
…tor.timespan (#104541) (#105426) * check monitor status by monitor.timespan * Delete mappings.json * adjust logic Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Dominique Clarke <[email protected]>
Resolves #103428
This PR updates the Monitor Status rule type to check for down monitors by
monitor.timespan
, rather than@timestamp
.The
monitor.timespan
field is a date range field that represents the time at which the ping started all the way through the time of the next scheduled monitor. By using this field, we are essentially querying against the entire time the ping is "effective".To execute this query in a performant way, we must first filter the number of documents we are searching against, otherwise we'd search through the entire history of documents. To achieve this, we first filter documents by
@timestamp
for one of the following timeframes (whichever time frame is larger)Why timeframe minus 24 hours?
The timeframe represents the time in which the user desires to query for down monitors, but those documents may have been indexed farther back the timeframe and still have an effectively down monitor according to the
monitor.timespan
field. For this reason, we create a buffer of 24h behind the timeframe to check for documents that may contain a matching down document formonitor.timespan
. For pings with large scheduled intervals, we are at risk of missing down documents if we do not have a sufficiently large window we are looking within.Why use the larger of the two timeframes?
Edge cases exist for scenarios various rule configuration scenarios. These edge cases are only present for monitors with large schedule intervals for example:
When the rule schedule interval is greater than the timeframe-24hrs, there is the potential to miss documents indexed at the beginning of the rule schedule interval. In this case, we filter
@timestamp
bynow to now minus rule schedule interval
When the user configured timeframe is greater than the rule interval, there is the potential to miss documents indexed within the timeframe. In this case, we filter
@timestamp
bynow to now minus the timeframe minus 24hrs
Edge cases
Filtering first by
@timestamp
inherently creates a scenario in which documents could be missed. By creating look-back logic, we capture most of the potential documents which may be down, but are still at risk of missing documents with high scheduled intervals.Outstanding questions
monitor.timespan
Testing