Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: investigate moderate 50% CPU usage in stationary connected topology #505

Open
viniarck opened this issue Oct 18, 2024 · 2 comments
Open
Labels
epic_high_cpu_usage future_release Planned for the next release priority_critical Critical priority

Comments

@viniarck
Copy link
Member

viniarck commented Oct 18, 2024

@italovalcy on his 2023.2 exploratory tests has identified a moderate-ish 50% CPU spikes with a stationary connected topology (no network events convergence happening). So, I'm capturing this to be investigated in the feature, in the meantime since I was at it, I also managed to reproduce it.

I'm using a 3-switch ring topology with OvS with master branch (which will be future 2024.2 as of Oct 18, 2024). Collected proc cpu and mem usage with these cases (I'm running Linux as my OS, and CPU 12th Gen Intel(R) Core(TM) i7-12700H):

  1. Case 1 - Switches connected stationary with psrecord sampling every 1s

kytosd_stationary_2

On a 1-second interval CPU usage is fairly low as expected, no issues here.

  1. Case 2 - Switches connected stationary with psrecord sampling every 0.1s

kytosd_stationary_11

This case reflects something similar to what @italovalcy has seen and presented, so indeed it might be something in this 100 ms scale of the basic and periodic functionalities and tasks of the platform and its NApps that is causing this. So, it needs further CPU profiling instrumenting kytosd to see what's consuming the most and causing the spikes, it'd be cool to see this at the method level too.

  1. Case 3 - Switches NOT connected with psrecord sampling every 0.1s

kytosd_stationary_10

In this case, the only difference was switches not connected, but psrecord sampling eveyr 0.1s, and no spikes were observed, so this also confirms that indeed it's something related to periodic functionalities involving the switches connected and adjacent core parts

Related issues

Related issue #478 (but with network scalability convergence)

@viniarck viniarck added future_release Planned for the next release epic_high_cpu_usage labels Oct 18, 2024
@jab1982 jab1982 added the priority_critical Critical priority label Oct 18, 2024
@viniarck
Copy link
Member Author

viniarck commented Oct 18, 2024

  1. Case 4 - 20 Switches connected stationary with psrecord sampling every 0.1s

If you also increase the number of switches, I'll consume more CPU and generate more spikes, it's correlated (don't know yet but how much), but this is also another evidence to investigate and keep in mind the future

kytosd_stationary_13

@viniarck
Copy link
Member Author

  1. Case 5 - 20 Switches connected stationary with psrecord sampling every 1s

kytosd_stationary_14

With 20 switches but psrecord sampling 1s no issues, despite one single spike (needs to be measured more times)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic_high_cpu_usage future_release Planned for the next release priority_critical Critical priority
Projects
None yet
Development

No branches or pull requests

2 participants