Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana performance - tools, benchmarking, CI, optimizations #86833

Closed
3 of 9 tasks
peterschretlen opened this issue Dec 22, 2020 · 2 comments
Closed
3 of 9 tasks

Kibana performance - tools, benchmarking, CI, optimizations #86833

peterschretlen opened this issue Dec 22, 2020 · 2 comments
Labels
impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort Meta performance Team:Operations Team label for Operations Team

Comments

@peterschretlen
Copy link
Contributor

peterschretlen commented Dec 22, 2020

There have been a number of performance initiatives lately, and the topic of measuring and improving performance has come up frequently as a priority. The purpose of this issue is to capture existing and planned efforts in the context of an overall plan/objectives.

Kibana performance space

Kibana as a system has many variables that can vary substantially, creating a large performance space to cover:

  • Different use cases (Security, Observability, BI/analytics, Geo)
  • the browser / JS engine used
  • Kibana configuration, Elasticsearch configuration
  • Data being queried / index configuration
  • Stack environment ( cloud, on-prem, ECK )
  • Ingestion load

Kibana also has different sources of load:

  1. User load: This is the load we typically consider - # concurrent users and types of tasks they perform
  2. Load from other clients: Kibana increasingly serves non-browser clients. External tools that use Kibana APIs, or components like Fleet could put enough load on the Node.js server to cause problems.
  3. Background load: We’re adding more services and tasks server-side. Alerting, reporting, telemetry, background search are all examples. Some of these can be computationally expensive, which risks disrupting the Node.js event loop. Since this load is not tied to a request, it requires system introspection to understand it.

image

Where we are today

Where are the weak spots?

  • Front end: We lack any benchmarking or performance metrics on the front end
  • Single CPU: More work is happening on the server, including cpu intensive work that has the potential to disrupt the JS event loop and affect all of Kibana.
  • Metrics APIs: We have the /api/stats and /api/task_manager/_health APIs today, but comprehensive metrics would give us a solid base for building performance tooling, autoscaling, monitoring, and diagnostic tooling

Objectives

  1. Establish a set of benchmarks, focused on server/api performance initially.

    • Http load scenarios (measuring response latency and error rates)
    • Background load scenarios (measuring throughput)
    • Combinations of the above that include representative datasets or ingestion
  2. Prevent performance degradation on benchmarks

    • Each version of our software works as well or better than the previous one.
  3. A good benchmark developer experience

    • Encourage proactive use and make troubleshooting performance easier
  4. A rich stats API(s)

    • Provide insight into Kibana similar to what Elasticsearch provides through its cluster/node/index stats API. This not only helps with benchmarking, but can be used for better monitoring, signals to use for autoscaling, and for diagnostic tooling used in support and troubleshooting.
  5. Kibana goes beyond single CPU limitation

Phases

MVP

The MVP has 2 benchmarks running at least daily, with results being sent to a stats collector to make sure they don’t degrade over time.

Phase 1

Extends the MVP by making it easy for any developer to run a benchmark from a specific commit or PR, and adds more benchmarks informed by APM.

  • Developers can run benchmark jobs on specific commits, or in a PR using a github bot integration
  • Extend the metrics from our stats or health APIs, or add extension points so plugins can include their own metrics.

Phase 2

Introduces the ability for Kibana to scale vertically, and improves the observability of Kibana through additional stats. This improves our benchmarks, helps us better support Kibana, sets us up to improve Stack monitoring of Kibana.

Related Meta issues:

@peterschretlen peterschretlen added Team:Operations Team label for Operations Team Meta performance labels Dec 22, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@tylersmalley tylersmalley added 1 and removed 1 labels Oct 11, 2021
@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Feb 16, 2022
@tylersmalley tylersmalley removed loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. EnableJiraSync labels Mar 16, 2022
@lizozom
Copy link
Contributor

lizozom commented Apr 18, 2022

Closing this issue due to inactivity.
Feel free to reopen if needed 🙏🏻

@lizozom lizozom closed this as completed Apr 18, 2022
@exalate-issue-sync exalate-issue-sync bot reopened this Apr 18, 2022
@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort Meta performance Team:Operations Team label for Operations Team
Projects
None yet
Development

No branches or pull requests

4 participants