Move detailed per CPU kernel stats under a new flag `METRICS_V2_KERNEL_COUNTERS_PER_CPU` #2028

incertum · 2024-08-28T02:34:11Z

The new detailed kernel counters per CPU are great. At the same time it can add a lot of metrics, especially for beefy server with 128+ CPUs. Therefore proposing to move them under a new flag, e.g. METRICS_V2_KERNEL_COUNTERS_PER_CPU and expose a sub-config key in Falco to allow users to opt in or opt out of these new counters.

CC @Andreagit97 WDYT?

If we all agree we should prioritize this for 0.18.0 so that we have a coherent UX.

The text was updated successfully, but these errors were encountered:

incertum · 2024-08-28T02:34:23Z

/milestone 0.18.0

Andreagit97 · 2024-08-28T08:26:29Z

Yes, it makes sense, I will take care of it! Thank you for the suggestion

incertum · 2024-08-28T16:47:33Z

Thank you Andrea!

Few more thoughts:

So far we always enabled all metrics categories in Falco, while metrics themselves are disabled by default. For this metric category was thinking we should disable it by default. What are your thoughts?
We could also consider exposing statistical metrics in the future, such as the kurtosis or skewness of these counters, whenever a snapshot is taken in libsinsp. In some use cases, you might prefer to opt into receiving only these statistical metrics instead of all the raw counter fields. Other use cases may still require the raw counters. Happy to help with this if we all agree it's useful. Mentioning it now in case it is useful to shape the design.

Andreagit97 · 2024-08-29T08:44:16Z

So far we always enabled all metrics categories in Falco, while metrics themselves are disabled by default. For this metric category was thinking we should disable it by default. What are your thoughts?

Yep I agree

We could also consider exposing statistical metrics in the future, such as the kurtosis or skewness of these counters, whenever a snapshot is taken in libsinsp. In some use cases, you might prefer to opt into receiving only these statistical metrics instead of all the raw counter fields. Other use cases may still require the raw counters. Happy to help with this if we all agree it's useful. Mentioning it now in case it is useful to shape the design.

Yeah, it seems a great idea but I'm not sure what is the right place to obtain them. Definitely not an expert in metrics representation but is not possible to obtain these kinds of data directly in Prometheus or something similar starting from the metrics we expose today?

incertum · 2024-08-30T03:23:41Z

We could also consider exposing statistical metrics in the future, such as the kurtosis or skewness of these counters, whenever a snapshot is taken in libsinsp. In some use cases, you might prefer to opt into receiving only these statistical metrics instead of all the raw counter fields. Other use cases may still require the raw counters. Happy to help with this if we all agree it's useful. Mentioning it now in case it is useful to shape the design.

Yeah, it seems a great idea but I'm not sure what is the right place to obtain them. Definitely not an expert in metrics representation but is not possible to obtain these kinds of data directly in Prometheus or something similar starting from the metrics we expose today?

On top of my head one pro could be to save the space of sending many raw metric fields for machines with many CPUs.

Andreagit97 · 2024-08-30T07:49:49Z

that's a good point. Let's say that I see these per-CPU metrics more as a debug option than something to run in production by default, if we end up using them we are probably in a critical situation with performance issues, so the overhead introduced by them could be acceptable since we are trying to debug...so yes statistical metrics are for sure useful but I'm not 100% sure this is the right case to apply them, WDYT?

incertum added the kind/feature New feature or request label Aug 28, 2024

poiana added this to the 0.18.0 milestone Aug 28, 2024

Andreagit97 mentioned this issue Aug 28, 2024

cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031

Merged

poiana closed this as completed in #2031 Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move detailed per CPU kernel stats under a new flag `METRICS_V2_KERNEL_COUNTERS_PER_CPU` #2028

Move detailed per CPU kernel stats under a new flag `METRICS_V2_KERNEL_COUNTERS_PER_CPU` #2028

incertum commented Aug 28, 2024

incertum commented Aug 28, 2024

Andreagit97 commented Aug 28, 2024

incertum commented Aug 28, 2024

Andreagit97 commented Aug 29, 2024

incertum commented Aug 30, 2024

Andreagit97 commented Aug 30, 2024

Move detailed per CPU kernel stats under a new flag METRICS_V2_KERNEL_COUNTERS_PER_CPU #2028

Move detailed per CPU kernel stats under a new flag METRICS_V2_KERNEL_COUNTERS_PER_CPU #2028

Comments

incertum commented Aug 28, 2024

incertum commented Aug 28, 2024

Andreagit97 commented Aug 28, 2024

incertum commented Aug 28, 2024

Andreagit97 commented Aug 29, 2024

incertum commented Aug 30, 2024

Andreagit97 commented Aug 30, 2024

Move detailed per CPU kernel stats under a new flag `METRICS_V2_KERNEL_COUNTERS_PER_CPU` #2028

Move detailed per CPU kernel stats under a new flag `METRICS_V2_KERNEL_COUNTERS_PER_CPU` #2028