Kibana performance - tools, benchmarking, CI, optimizations #86833

peterschretlen · 2020-12-22T20:12:43Z

There have been a number of performance initiatives lately, and the topic of measuring and improving performance has come up frequently as a priority. The purpose of this issue is to capture existing and planned efforts in the context of an overall plan/objectives.

Kibana performance space

Kibana as a system has many variables that can vary substantially, creating a large performance space to cover:

Different use cases (Security, Observability, BI/analytics, Geo)
the browser / JS engine used
Kibana configuration, Elasticsearch configuration
Data being queried / index configuration
Stack environment ( cloud, on-prem, ECK )
Ingestion load

Kibana also has different sources of load:

User load: This is the load we typically consider - # concurrent users and types of tasks they perform
Load from other clients: Kibana increasingly serves non-browser clients. External tools that use Kibana APIs, or components like Fleet could put enough load on the Node.js server to cause problems.
Background load: We’re adding more services and tasks server-side. Alerting, reporting, telemetry, background search are all examples. Some of these can be computationally expensive, which risks disrupting the Node.js event loop. Since this load is not tied to a request, it requires system introspection to understand it.

Where we are today

we can collect metrics in CI pipeline like bundle size and distributable size that are proxies for performance
custom load tools like kbn-alert-load
standard web performance tools like those used in kibana-load-testing
Elasticsearch benchmark tools like rally
APM metric collection in Kibana during CI

Where are the weak spots?

Front end: We lack any benchmarking or performance metrics on the front end
Single CPU: More work is happening on the server, including cpu intensive work that has the potential to disrupt the JS event loop and affect all of Kibana.
Metrics APIs: We have the /api/stats and /api/task_manager/_health APIs today, but comprehensive metrics would give us a solid base for building performance tooling, autoscaling, monitoring, and diagnostic tooling

Objectives

Establish a set of benchmarks, focused on server/api performance initially.
- Http load scenarios (measuring response latency and error rates)
- Background load scenarios (measuring throughput)
- Combinations of the above that include representative datasets or ingestion
Prevent performance degradation on benchmarks
- Each version of our software works as well or better than the previous one.
A good benchmark developer experience
- Encourage proactive use and make troubleshooting performance easier
A rich stats API(s)
- Provide insight into Kibana similar to what Elasticsearch provides through its cluster/node/index stats API. This not only helps with benchmarking, but can be used for better monitoring, signals to use for autoscaling, and for diagnostic tooling used in support and troubleshooting.
Kibana goes beyond single CPU limitation

Phases

MVP

The MVP has 2 benchmarks running at least daily, with results being sent to a stats collector to make sure they don’t degrade over time.

Alerting performance benchmarks [Alerts] Performance benchmarks #40264
API load test on Kibana CI Run load tests on Kibana CI kibana-load-testing#14
Improve APM agent instrumentation [Meta] Improve APM agent instrumentation #78869
what are the issues with adding to our _stats API response? How can we include metrics from outside of our core metrics service?
Investigate using Node.js clustering to understand the obstacles, and build a POC.

Phase 1

Extends the MVP by making it easy for any developer to run a benchmark from a specific commit or PR, and adds more benchmarks informed by APM.

Developers can run benchmark jobs on specific commits, or in a PR using a github bot integration
Extend the metrics from our stats or health APIs, or add extension points so plugins can include their own metrics.

Phase 2

Introduces the ability for Kibana to scale vertically, and improves the observability of Kibana through additional stats. This improves our benchmarks, helps us better support Kibana, sets us up to improve Stack monitoring of Kibana.

Nodejs clustering - Use Nodejs clustering to take advantage of multi-core hardware #68626
Background session benchmarking - [Search] Search sessions benchmarking #77293

Related Meta issues:

API load testing: [discuss] Kibana APIs load testing #73189
Kibana platform performance: [Meta] Kibana platform performance #63848
Kibana build & performance benchmarking: Kibana build and performance benchmarking #68798

elasticmachine · 2020-12-22T20:12:45Z

Pinging @elastic/kibana-operations (Team:Operations)

lizozom · 2022-04-18T12:41:05Z

Closing this issue due to inactivity.
Feel free to reopen if needed 🙏🏻

peterschretlen added Team:Operations Team label for Operations Team Meta performance labels Dec 22, 2020

tylersmalley added 1 and removed 1 labels Oct 11, 2021

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Feb 16, 2022

tylersmalley removed loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. EnableJiraSync labels Mar 16, 2022

lizozom closed this as completed Apr 18, 2022

exalate-issue-sync bot reopened this Apr 18, 2022

exalate-issue-sync bot closed this as completed Apr 18, 2022

exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana performance - tools, benchmarking, CI, optimizations #86833

Kibana performance - tools, benchmarking, CI, optimizations #86833

peterschretlen commented Dec 22, 2020 •

edited by danielmitterdorfer

Loading

elasticmachine commented Dec 22, 2020

lizozom commented Apr 18, 2022

Kibana performance - tools, benchmarking, CI, optimizations #86833

Kibana performance - tools, benchmarking, CI, optimizations #86833

Comments

peterschretlen commented Dec 22, 2020 • edited by danielmitterdorfer Loading

Kibana performance space

Where we are today

Where are the weak spots?

Objectives

Phases

MVP

Phase 1

Phase 2

elasticmachine commented Dec 22, 2020

lizozom commented Apr 18, 2022

peterschretlen commented Dec 22, 2020 •

edited by danielmitterdorfer

Loading