Document monitor threshold overrides via annotations #1423

aacevedoosorio · 2023-08-23T12:15:50Z

No description provided.

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

vale

use/alerting/kubernetes-monitors.md|25 col 911| [Microsoft.Contractions] Use 'isn't' instead of 'is not'.
use/alerting/kubernetes-monitors.md|47 col 5| [Microsoft.Headings] 'Node Disk Pressure' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|49 col 383| [Microsoft.Accessibility] Don't use language (such as 'normal') that defines people by their disability.
use/alerting/kubernetes-monitors.md|49 col 416| [Microsoft.Adverbs] Consider removing 'regularly'.
use/alerting/kubernetes-monitors.md|49 col 465| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|50 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|52 col 5| [Microsoft.Headings] 'Node Memory Pressure' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|54 col 376| [Vale.Spelling] Did you really mean 'misconfigured'?
use/alerting/kubernetes-monitors.md|54 col 524| [Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.
use/alerting/kubernetes-monitors.md|54 col 581| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|54 col 903| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|54 col 931| [Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.
use/alerting/kubernetes-monitors.md|54 col 964| [Microsoft.ComplexWords] Consider using 'large' instead of 'substantial'.
use/alerting/kubernetes-monitors.md|54 col 1127| [Microsoft.Contractions] Use 'it's' instead of 'it is'.
use/alerting/kubernetes-monitors.md|54 col 1430| [Vale.Spelling] Did you really mean 'autoscaling'?
use/alerting/kubernetes-monitors.md|54 col 1498| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|54 col 1621| [Microsoft.ComplexWords] Consider using 'keep' or 'support' instead of 'maintain'.
use/alerting/kubernetes-monitors.md|54 col 1632| [Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.
use/alerting/kubernetes-monitors.md|54 col 1778| [Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.
use/alerting/kubernetes-monitors.md|54 col 1821| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|55 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|57 col 5| [Microsoft.Headings] 'Node PID Pressure' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|57 col 10| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|57 col 10| [Microsoft.HeadingAcronyms] Avoid using acronyms in a title or heading.
use/alerting/kubernetes-monitors.md|59 col 6| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 69| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 162| [Vale.Spelling] Did you really mean 'misconfigured'?
use/alerting/kubernetes-monitors.md|59 col 278| [Microsoft.ComplexWords] Consider using 'right' or 'exact' instead of 'accurate'.
use/alerting/kubernetes-monitors.md|59 col 331| [Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.
use/alerting/kubernetes-monitors.md|59 col 376| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|59 col 416| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 456| [Vale.Spelling] Did you really mean 'PIDs'?
use/alerting/kubernetes-monitors.md|59 col 494| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 545| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 686| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 690| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 718| [Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.
use/alerting/kubernetes-monitors.md|59 col 763| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 860| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 877| [Microsoft.ComplexWords] Consider using 'discuss' instead of 'address'.
use/alerting/kubernetes-monitors.md|59 col 890| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 904| [Microsoft.Contractions] Use 'it's' instead of 'it is'.
use/alerting/kubernetes-monitors.md|59 col 988| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|59 col 1022| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1079| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1145| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1203| [Vale.Spelling] Did you really mean 'autoscaling'?
use/alerting/kubernetes-monitors.md|59 col 1265| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1269| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 1314| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1363| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1407| [Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.
use/alerting/kubernetes-monitors.md|59 col 1424| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1455| [Microsoft.Contractions] Use 'it's' instead of 'It is'.
use/alerting/kubernetes-monitors.md|59 col 1560| [Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.
use/alerting/kubernetes-monitors.md|59 col 1600| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|60 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|62 col 5| [Microsoft.Headings] 'Node Readiness' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|65 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|71 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|73 col 5| [Microsoft.Headings] 'Pod Ready State' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|107 col 5| [Vale.Spelling] Did you really mean 'Unschedulable'?
use/alerting/kubernetes-monitors.md|107 col 5| [Microsoft.Headings] 'Unschedulable Node' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|109 col 1| [Microsoft.SentenceLength] Try to keep sentences short (< 30 words).
use/alerting/kubernetes-monitors.md|109 col 8| [Microsoft.ComplexWords] Consider using 'meet' instead of 'encounter'.
use/alerting/kubernetes-monitors.md|109 col 232| [Microsoft.Contractions] Use 'can't' instead of 'cannot'.

github-actions · 2023-08-23T12:17:00Z

use/alerting/k8s-override-monitor-arguments.md

+description: StackState Kubernetes Troubleshooting
+---
+
+# Override monitor threshold arguments via kubernetes annotations


🚫 [vale] _{reported by reviewdog 🐶}
[Vale.Spelling] Did you really mean 'kubernetes'?

github-actions · 2023-08-23T12:17:00Z

use/alerting/k8s-override-monitor-arguments.md

+
+## Overview
+
+StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.ComplexWords] Consider using 'give' or 'offer' instead of 'provide'.

github-actions · 2023-08-23T12:17:00Z

use/alerting/k8s-override-monitor-arguments.md

+
+## Overview
+
+StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).

github-actions · 2023-08-23T12:17:00Z

use/alerting/k8s-override-monitor-arguments.md

+
+## Overview
+
+StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.


⚠️ [vale] _{reported by reviewdog 🐶}
[Microsoft.We] Try to avoid using first-person plural like 'we'.

like vale says:)

github-actions · 2023-08-23T12:17:00Z

use/alerting/k8s-override-monitor-arguments.md

+## Overview
+
+StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.
+The mechanism to declare the overrides is via kubernetes resource annotations that denote to which monitor and component they should apply. For example we could override the `failureState` for the `Available service endpoints` monitor for a specific service where we want to signal a `CRITICAL` state when it fails rather than the default `DEVIATING`.


🚫 [vale] _{reported by reviewdog 🐶}
[Vale.Spelling] Did you really mean 'kubernetes'?

github-actions · 2023-08-23T12:17:02Z

use/alerting/kubernetes-monitors.md

+
+### Daemonset desired replicas
+
+It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


🚫 [vale] _{reported by reviewdog 🐶}
[Vale.Spelling] Did you really mean 'Daemonsets'?

github-actions · 2023-08-23T12:17:02Z

use/alerting/kubernetes-monitors.md

+
+### Daemonset desired replicas
+
+It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


🚫 [vale] _{reported by reviewdog 🐶}
[Microsoft.Contractions] Use 'isn't' instead of 'is not'.

github-actions · 2023-08-23T12:17:02Z

use/alerting/kubernetes-monitors.md

+
+### Deployment desired replicas
+
+It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


🚫 [vale] _{reported by reviewdog 🐶}
[Microsoft.Contractions] Use 'it's' instead of 'It is'.

github-actions · 2023-08-23T12:17:02Z

use/alerting/kubernetes-monitors.md

+
+### Deployment desired replicas
+
+It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.ComplexWords] Consider using 'same' instead of 'identical'.

github-actions · 2023-08-23T12:17:02Z

use/alerting/kubernetes-monitors.md

+
+### Deployment desired replicas
+
+It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.ComplexWords] Consider using 'keep' or 'support' instead of 'maintain'.

craffit · 2023-08-23T13:00:41Z

use/alerting/k8s-override-monitor-arguments.md

+
+The override annotations keys for StackState monitors follow the following convention:
+```
+monitor.${owner}.stackstate.io/${monitorShorName}


monitorShowName=monitorShortName

use/alerting/k8s-add-monitors-cli.md

craffit · 2023-08-23T13:02:25Z

use/alerting/kubernetes-monitors.md

+
+### Node PID Pressure
+
+Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.


is this copy-paste from the descriptions? if so, no more review needed

Yup. I took it directly from the descriptions on master

craffit · 2023-08-23T13:03:06Z

use/alerting/k8s-override-monitor-arguments.md

+* [Node Memory Pressure](/use/alerting/kubernetes-monitors.md#node-memory-pressure)
+* [Node PID Pressure](/use/alerting/kubernetes-monitors.md#node-pid-pressure)
+* [Node Readiness](/use/alerting/kubernetes-monitors.md#node-readiness)
+* [Out of memory for containers](/use/alerting/kubernetes-monitors.md#out-of-memory-for-containers)


Maybe a last short section how to do an override on a custom monitor, to have that example in aswell

craffit

Approved, looking good

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

vale

use/alerting/kubernetes-monitors.md|59 col 278| [Microsoft.ComplexWords] Consider using 'right' or 'exact' instead of 'accurate'.
use/alerting/kubernetes-monitors.md|59 col 331| [Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.
use/alerting/kubernetes-monitors.md|59 col 376| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|59 col 416| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 456| [Vale.Spelling] Did you really mean 'PIDs'?
use/alerting/kubernetes-monitors.md|59 col 494| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 545| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 686| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 690| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 718| [Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.
use/alerting/kubernetes-monitors.md|59 col 763| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 860| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 877| [Microsoft.ComplexWords] Consider using 'discuss' instead of 'address'.
use/alerting/kubernetes-monitors.md|59 col 890| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 904| [Microsoft.Contractions] Use 'it's' instead of 'it is'.
use/alerting/kubernetes-monitors.md|59 col 988| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|59 col 1022| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1079| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1145| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1203| [Vale.Spelling] Did you really mean 'autoscaling'?
use/alerting/kubernetes-monitors.md|59 col 1265| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1269| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 1314| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1363| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1407| [Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.
use/alerting/kubernetes-monitors.md|59 col 1424| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1455| [Microsoft.Contractions] Use 'it's' instead of 'It is'.
use/alerting/kubernetes-monitors.md|59 col 1560| [Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.
use/alerting/kubernetes-monitors.md|59 col 1600| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|60 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|62 col 5| [Microsoft.Headings] 'Node Readiness' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|65 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|71 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|73 col 5| [Microsoft.Headings] 'Pod Ready State' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|107 col 5| [Microsoft.Headings] 'Unschedulable Node' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|107 col 5| [Vale.Spelling] Did you really mean 'Unschedulable'?
use/alerting/kubernetes-monitors.md|109 col 1| [Microsoft.SentenceLength] Try to keep sentences short (< 30 words).
use/alerting/kubernetes-monitors.md|109 col 8| [Microsoft.ComplexWords] Consider using 'meet' instead of 'encounter'.
use/alerting/kubernetes-monitors.md|109 col 232| [Microsoft.Contractions] Use 'can't' instead of 'cannot'.

github-actions · 2023-08-24T07:54:42Z

use/alerting/k8s-override-monitor-arguments.md

+
+## Overview
+
+StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes there's need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.ComplexWords] Consider using 'give' or 'offer' instead of 'provide'.

github-actions · 2023-08-24T07:54:42Z

use/alerting/k8s-override-monitor-arguments.md

+
+## Overview
+
+StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes there's need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).

github-actions · 2023-08-24T07:54:42Z

use/alerting/k8s-override-monitor-arguments.md

+
+## Build an override for a custom monitor
+
+Any custom threshold monitor created using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) is suitable to override arguments, as [the example shows](/use/alerting/k8s-add-monitors-cli.md#write-the-outline-of-the-monitor) an identifier for a custom monitor is structured as `urn:custom:monitor:{monitorShortName}`and the override annotation key for such an identifier is expected to be


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).

github-actions · 2023-08-24T07:54:43Z

use/alerting/kubernetes-monitors.md

+
+### Deployment desired replicas
+
+It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


🚫 [vale] _{reported by reviewdog 🐶}
[Microsoft.Contractions] Use 'isn't' instead of 'is not'.

github-actions · 2023-08-24T07:54:43Z

use/alerting/kubernetes-monitors.md

@@ -34,12 +44,33 @@ It is important to monitor the usage of Persistent Volume Claims (PVCs) in your
 It is important to monitor the usage of Persistent Volume Claims (PVCs) in your Kubernetes cluster over time. PVCs are used to store data that needs to persist beyond the lifetime of a container, and it's crucial to ensure that they have enough space to store the data.
 To track this, StackState set up a check that uses linear prediction to forecast the Kubernetes volume usage trend over a 4-day period. If the trend indicates that the PVCs will run out of space within this time frame, you will receive a notification, allowing you to take action to prevent data loss or downtime.

+### Node Disk Pressure


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.Headings] 'Node Disk Pressure' should use sentence-style capitalization.

github-actions · 2023-08-24T07:54:46Z

use/alerting/kubernetes-monitors.md

+Node memory pressure refers to a situation where the memory resources on a Kubernetes node are excessively strained. While encountering node memory pressure is uncommon due to Kubernetes' built-in resource management mechanisms, it can still occur under specific circumstances. There are two primary reasons why node memory pressure may arise. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on resource requests and limits to allocate and manage resources effectively. If containers are not accurately configured with their memory requirements, they may consume more memory than expected, leading to node memory pressure. The second reason involves the presence of memory-intensive applications or processes. Certain workloads or applications may have higher memory demands, resulting in increased memory utilization on the node. If multiple pods or containers with substantial memory requirements are scheduled on the same node without proper resource allocation, it can cause memory pressure. To mitigate node memory pressure, it is crucial to review and adjust resource requests and limits for containers, ensuring they align with the actual memory needs of the applications. Monitoring and optimizing memory usage within the applications themselves can also help reduce memory consumption. Additionally, consider horizontal pod autoscaling to dynamically scale the number of pods based on memory utilization. Regular monitoring, analysis of memory-related metrics, and proactive allocation of memory resources can help maintain a healthy memory state on Kubernetes nodes. It's essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent memory pressure and ensure optimal performance.
+Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)
+
+### Node PID Pressure


⚠️ [vale] _{reported by reviewdog 🐶}
[Microsoft.HeadingAcronyms] Avoid using acronyms in a title or heading.

github-actions · 2023-08-24T07:54:46Z

use/alerting/kubernetes-monitors.md

+Node memory pressure refers to a situation where the memory resources on a Kubernetes node are excessively strained. While encountering node memory pressure is uncommon due to Kubernetes' built-in resource management mechanisms, it can still occur under specific circumstances. There are two primary reasons why node memory pressure may arise. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on resource requests and limits to allocate and manage resources effectively. If containers are not accurately configured with their memory requirements, they may consume more memory than expected, leading to node memory pressure. The second reason involves the presence of memory-intensive applications or processes. Certain workloads or applications may have higher memory demands, resulting in increased memory utilization on the node. If multiple pods or containers with substantial memory requirements are scheduled on the same node without proper resource allocation, it can cause memory pressure. To mitigate node memory pressure, it is crucial to review and adjust resource requests and limits for containers, ensuring they align with the actual memory needs of the applications. Monitoring and optimizing memory usage within the applications themselves can also help reduce memory consumption. Additionally, consider horizontal pod autoscaling to dynamically scale the number of pods based on memory utilization. Regular monitoring, analysis of memory-related metrics, and proactive allocation of memory resources can help maintain a healthy memory state on Kubernetes nodes. It's essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent memory pressure and ensure optimal performance.
+Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)
+
+### Node PID Pressure


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.Acronyms] 'PID' has no definition.

github-actions · 2023-08-24T07:54:46Z

use/alerting/kubernetes-monitors.md

+
+### Node PID Pressure
+
+Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.Acronyms] 'PID' has no definition.

github-actions · 2023-08-24T07:54:46Z

use/alerting/kubernetes-monitors.md

+
+### Node PID Pressure
+
+Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.


📝 [vale] _{reported by reviewdog 🐶}
[Microsoft.Acronyms] 'PID' has no definition.

github-actions · 2023-08-24T07:54:46Z

use/alerting/kubernetes-monitors.md

+
+### Node PID Pressure
+
+Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.


🚫 [vale] _{reported by reviewdog 🐶}
[Vale.Spelling] Did you really mean 'misconfigured'?

STAC-19922: Document monitor argument overrides

94aa48c

github-actions bot reviewed Aug 23, 2023

View reviewed changes

craffit reviewed Aug 23, 2023

View reviewed changes

use/alerting/k8s-add-monitors-cli.md Show resolved Hide resolved

craffit reviewed Aug 23, 2023

View reviewed changes

craffit self-requested a review August 24, 2023 07:38

craffit approved these changes Aug 24, 2023

View reviewed changes

STAC-19922: Add section about annotation for custom monitors

3c7f553

github-actions bot reviewed Aug 24, 2023

View reviewed changes

aacevedoosorio merged commit a6cb4c1 into k8s-troubleshooting Aug 29, 2023
1 check passed

aacevedoosorio deleted the stac-19922 branch September 1, 2023 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document monitor threshold overrides via annotations #1423

Document monitor threshold overrides via annotations #1423

aacevedoosorio commented Aug 23, 2023

github-actions bot left a comment

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

craffit Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

github-actions bot Aug 23, 2023

craffit Aug 23, 2023

craffit Aug 23, 2023

aacevedoosorio Aug 23, 2023

craffit Aug 23, 2023

craffit left a comment

github-actions bot left a comment

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023

github-actions bot Aug 24, 2023


		## Overview

		StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.


		### Daemonset desired replicas

		It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


		### Deployment desired replicas

		It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.


		### Node PID Pressure

		Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.


		## Build an override for a custom monitor

		Any custom threshold monitor created using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) is suitable to override arguments, as [the example shows](/use/alerting/k8s-add-monitors-cli.md#write-the-outline-of-the-monitor) an identifier for a custom monitor is structured as `urn:custom:monitor:{monitorShortName}`and the override annotation key for such an identifier is expected to be

Document monitor threshold overrides via annotations #1423

Document monitor threshold overrides via annotations #1423

Conversation

aacevedoosorio commented Aug 23, 2023

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

craffit Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

github-actions bot Aug 23, 2023

Choose a reason for hiding this comment

craffit Aug 23, 2023

Choose a reason for hiding this comment

craffit Aug 23, 2023

Choose a reason for hiding this comment

aacevedoosorio Aug 23, 2023

Choose a reason for hiding this comment

craffit Aug 23, 2023

Choose a reason for hiding this comment

craffit left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment

github-actions bot Aug 24, 2023

Choose a reason for hiding this comment