[Monitoring] Thread pool rejections alert #79433

igoristic · 2020-10-05T10:59:43Z

Resolves #74822

The check calculates each data node to make sure that the thread pool rejections MAX within the last 5m range is below the implied threshold.

This is part of the "Additional Alerting" effort for Stack Monitoring

Testing:

Create a Stack Monitoring environment
Through the regular Setup Mode > Alert Edit flow/UX, set search or write threshold to a value and make sure it's enabled eg:
Modify field thread_pool.search.rejected in node_stats document (that's within the last 5 minutes) in the .monitoring-es* index

*Note: that it might take a couple of minutes for the notification to show up in the UI

elasticmachine · 2020-10-05T10:59:45Z

Pinging @elastic/stack-monitoring (Team:Monitoring)

chrisronline

Awesome to see this already ready! Great work so far! I had a few comments so far and I'm going to keep testing but wanted a chance to start talking through the comments.

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts

x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts

x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts

jakommo · 2020-10-05T15:54:53Z

Nice work!

In the second screenshot there is a link "Tune thread pools" where does this link to? (Sorry, can't find it in the code).
We need to be cautious with this as it might give the wrong expectation that increasing the thread pool will fix the issue, but in almost all cases the solution is to leave the thread tools at default and fix the root cause for why they were exhausted. I.e. writing or querying too many shards or using very small bulk sizes etc.

igoristic · 2020-10-05T16:07:30Z

@jakommo

In the second screenshot there is a link "Tune thread pools" where does this link to? ...

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html

I agree 💯 that we should guide on how to treat the cause/problem and not the symptom. But, still wanted them to be aware of the thread pool API (incase it's applicable)

…dpool_rejection_alert

jakommo · 2020-10-07T08:50:19Z

@igoristic thanks for calcifying.
I'm ok with keeping it to inform the user about it's existence, but I think we should move it to a less prominent spot (maybe last in the list?) and also change the name of the link. Maybe just call it "Thread pool settings" or so? The "Tune" makes it sound like increasing those would do any good 😁
cc @inqueue for input, because we talked about this a few weeks ago.

chrisronline

Functionally, this is looking great!! Nice work separating these two out. I have a couple of suggestions and will take a deeper look in the code next.

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts

chrisronline

A few things I found in the code, but looking great!

Also, I wanted to bring up this code: https:/elastic/kibana/blob/master/x-pack/plugins/monitoring/server/alerts/missing_monitoring_data_alert.ts#L109. I wonder if we need to do this for all alerts that support a customizable duration (which I think they all do). We hard-code the query to fetch clusters to look 2m in the past, but that wasn't sufficient for the missing monitoring data alert, and I'm wondering if it's sufficient for the others too.

WDYT?

x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts

…dpool_rejection_alert

spalger · 2020-10-26T22:24:34Z

packages/kbn-optimizer/limits.yml

@@ -54,7 +54,7 @@ pageLoadAssetSize:
 mapsLegacy: 116817
 mapsLegacyLicensing: 20214
 ml: 82187
- monitoring: 268612
+ monitoring: 288612


Our unofficial goal is to ultimately limit all page load asset sizes to 200kb in the next year or so, as teams continue to work reducing the size of their bundles. Is anyone from the monitoring team working to reduce the page load asset size of this bundle? a 20kb increase feels reasonable for now, but I'd like to ask that someone from the monitoring team take a look at the x-pack/plugins/monitoring/target/public/stats.json in one of the many webpack analyzers or visualizers after running node scripts/build_kibana_platform_plugins --focus monitoring --profile?

When I run it I see that the server code is being bundles in the page load bundle, which seems like the best way to fix the limit issue, rather than raising the limit.

PS, I'm working on unifying the docs for the best way to diagnose and deal with this stuff now

This is awesome! Thank you @spalger 🙇

spalger · 2020-10-26T22:29:33Z

x-pack/plugins/monitoring/public/angular/providers/private.js

@@ -81,9 +81,9 @@
 *
 * @param {[type]} prov [description]
 */
-import _ from 'lodash';
+import { partial, uniqueId, isObject } from 'lodash';


FYI, this has no impact on the size of the bundles since we always load and share a single, complete, lodash instance.

…dpool_rejection_alert

igoristic · 2020-10-29T01:51:50Z

@chrisronline Re-requested your already approved review, since I had to remove all the /server/ imports which added a lot of weight to our bundle size. Also, loading some stuff async to improve initial load

…dpool_rejection_alert

kibanamachine · 2020-10-29T16:47:57Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: a8772dd

Metrics [docs]

@kbn/optimizer bundle module count

id	before	after	diff
`monitoring`	633	607	-26

async chunk count

id	before	after	diff
`monitoring`	1	7	+6

async chunks size

id	before	after	diff
`monitoring`	892.7KB	964.4KB	+71.7KB

distributable file count

id	before	after	diff
`default`	48115	48137	+22

page load bundle size

id	before	after	diff
`monitoring`	190.0KB	33.1KB	-157.0KB

History

💔 Build #84462 failed 066a5f4
💔 Build #83819 failed 69aae85
💔 Build #83794 failed 22c8a6d
💔 Build #81608 failed eeb6b35
💔 Build #81104 failed acb2162

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

spalger

Oh snap, love the bundle limit decrease! 💯

chrisronline

LGTM! Great work!

* master: (71 commits) [Chrome] Extension to append an element to the last breadcrumb (elastic#82015) [Monitoring] Thread pool rejections alert (elastic#79433) [Actions] Fix actionType type on registerType function (elastic#82125) [Security Solution] Modal for saving timeline (elastic#81802) add tests for index pattern switching (elastic#81987) TS project references for share plugin (elastic#82051) [Graph] Fix problem with duplicate ids (elastic#82109) skip 'returns a single bucket if array has 1'. related elastic#81460 Add a link to documentation in the alerts and actions management UI (elastic#81909) [Fleet] fix duplicate ingest pipeline refs (elastic#82078) Context menu trigger for URL Drilldown (elastic#81158) SO management: fix legacy import index pattern selection being reset when switching page (elastic#81621) Fixed dead links (elastic#78696) [Search] Add "restore" to session service (elastic#81924) fix Lens heading structure (elastic#81752) [ML] Data Frame Analytics: Fix feature importance cell value and decision path chart (elastic#82011) Remove legacy app arch items from codeowners. (elastic#82084) [TSVB] Renamed 'positive rate' to 'counter rate' (elastic#80939) Expressions/migrations2 (elastic#81281) [Telemetry] [Schema] remove number type and support all es number types (elastic#81774) ...

* Thread pool rejections first draft * Split search and write rejections to seperate alerts * Code review feedback * Optimized page loading and bundle size * Increased monitoring bundle limit * Removed server app import into the frontend * Fixed tests and bundle size Co-authored-by: Kibana Machine <[email protected]> # Conflicts: # packages/kbn-optimizer/limits.yml

igoristic · 2020-10-30T18:02:22Z

Backport:
7.x: 0163b50

Thread pool rejections first draft

e976a69

igoristic added release_note:enhancement review Team:Monitoring Stack Monitoring team v8.0.0 Feature:Stack Monitoring v7.10.0 labels Oct 5, 2020

igoristic added this to the Logs UI 7.10 milestone Oct 5, 2020

igoristic requested a review from a team October 5, 2020 10:59

sgrodzicki modified the milestones: Logs UI 7.10, Stack Monitoring UI 7.10 Oct 5, 2020

chrisronline suggested changes Oct 5, 2020

View reviewed changes

igoristic added 6 commits October 6, 2020 05:40

Merge branch 'master' of https:/elastic/kibana into threa…

db3a723

…dpool_rejection_alert

Merge branch 'master' of https:/elastic/kibana into threa…

fea1ee8

…dpool_rejection_alert

Merge branch 'master' of https:/elastic/kibana into threa…

71aa34d

…dpool_rejection_alert

Merge branch 'master' of https:/elastic/kibana into threa…

1a0156d

…dpool_rejection_alert

Merge branch 'master' of https:/elastic/kibana into threa…

6908f61

…dpool_rejection_alert

Split search and write rejections to seperate alerts

a666da5

igoristic requested a review from chrisronline October 6, 2020 23:37

sgrodzicki added v7.11.0 and removed v7.10.0 labels Oct 7, 2020

sgrodzicki modified the milestones: Stack Monitoring UI 7.10, Stack Monitoring UI 7.11 Oct 7, 2020

chrisronline suggested changes Oct 7, 2020

View reviewed changes

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts Show resolved Hide resolved

x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts Show resolved Hide resolved

chrisronline suggested changes Oct 7, 2020

View reviewed changes

igoristic added 4 commits October 26, 2020 12:52

Merge branch 'master' of https:/elastic/kibana into threa…

36f80f0

…dpool_rejection_alert

Merge branch 'master' of https:/elastic/kibana into threa…

ffd436d

…dpool_rejection_alert

Optimized page loading and bundle size

22c8a6d

Increased monitoring bundle limit

69aae85

igoristic requested a review from a team as a code owner October 26, 2020 22:12

spalger reviewed Oct 26, 2020

View reviewed changes

igoristic added 3 commits October 28, 2020 11:56

Merge branch 'master' of https:/elastic/kibana into threa…

ddcd9d2

…dpool_rejection_alert

Merge branch 'master' of https:/elastic/kibana into threa…

617cb32

…dpool_rejection_alert

Removed server app import into the frontend

066a5f4

igoristic requested a review from chrisronline October 29, 2020 01:46

igoristic added 2 commits October 29, 2020 11:07

Fixed tests and bundle size

7d2a33d

Merge branch 'master' of https:/elastic/kibana into threa…

a8772dd

…dpool_rejection_alert

igoristic requested a review from spalger October 29, 2020 17:26

spalger approved these changes Oct 29, 2020

View reviewed changes

chrisronline approved these changes Oct 30, 2020

View reviewed changes

igoristic merged commit c1294f0 into elastic:master Oct 30, 2020

igoristic deleted the threadpool_rejection_alert branch October 30, 2020 14:50

igoristic mentioned this pull request Oct 30, 2020

[7.x] [Monitoring] Thread pool rejections alert (#79433) #82157

Merged

igoristic added backported and removed review labels Oct 30, 2020

igoristic mentioned this pull request Nov 18, 2020

[Monitoring] Optimizing alerting code #83681

Merged

5 tasks

chrisronline mentioned this pull request Dec 14, 2020

[Stack Monitoring] [Test Scenario] Out of the box alerting #85841

Closed

23 tasks

chrisronline mentioned this pull request Mar 1, 2021

[Stack Monitoring] [Test Scenario] Out of the box alerting #93072

Closed

24 tasks

simianhacker mentioned this pull request Apr 29, 2021

[Stack Monitoring] [Test Scenario] Out of the box alerting #98765

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring] Thread pool rejections alert #79433

[Monitoring] Thread pool rejections alert #79433

igoristic commented Oct 5, 2020

elasticmachine commented Oct 5, 2020

chrisronline left a comment

jakommo commented Oct 5, 2020

igoristic commented Oct 5, 2020

jakommo commented Oct 7, 2020

chrisronline left a comment

chrisronline left a comment

spalger Oct 26, 2020

igoristic Oct 26, 2020

spalger Oct 26, 2020

igoristic commented Oct 29, 2020

kibanamachine commented Oct 29, 2020

spalger left a comment

chrisronline left a comment

igoristic commented Oct 30, 2020

[Monitoring] Thread pool rejections alert #79433

[Monitoring] Thread pool rejections alert #79433

Conversation

igoristic commented Oct 5, 2020

elasticmachine commented Oct 5, 2020

chrisronline left a comment

Choose a reason for hiding this comment

jakommo commented Oct 5, 2020

igoristic commented Oct 5, 2020

jakommo commented Oct 7, 2020

chrisronline left a comment

Choose a reason for hiding this comment

chrisronline left a comment

Choose a reason for hiding this comment

spalger Oct 26, 2020

Choose a reason for hiding this comment

igoristic Oct 26, 2020

Choose a reason for hiding this comment

spalger Oct 26, 2020

Choose a reason for hiding this comment

igoristic commented Oct 29, 2020

kibanamachine commented Oct 29, 2020

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

async chunk count

async chunks size

distributable file count

page load bundle size

History

spalger left a comment

Choose a reason for hiding this comment

chrisronline left a comment

Choose a reason for hiding this comment

igoristic commented Oct 30, 2020