Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document monitor threshold overrides via annotations #1423

Merged
merged 2 commits into from
Aug 29, 2023

Conversation

aacevedoosorio
Copy link
Contributor

No description provided.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

vale

use/alerting/kubernetes-monitors.md|25 col 911| [Microsoft.Contractions] Use 'isn't' instead of 'is not'.
use/alerting/kubernetes-monitors.md|47 col 5| [Microsoft.Headings] 'Node Disk Pressure' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|49 col 383| [Microsoft.Accessibility] Don't use language (such as 'normal') that defines people by their disability.
use/alerting/kubernetes-monitors.md|49 col 416| [Microsoft.Adverbs] Consider removing 'regularly'.
use/alerting/kubernetes-monitors.md|49 col 465| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|50 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|52 col 5| [Microsoft.Headings] 'Node Memory Pressure' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|54 col 376| [Vale.Spelling] Did you really mean 'misconfigured'?
use/alerting/kubernetes-monitors.md|54 col 524| [Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.
use/alerting/kubernetes-monitors.md|54 col 581| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|54 col 903| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|54 col 931| [Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.
use/alerting/kubernetes-monitors.md|54 col 964| [Microsoft.ComplexWords] Consider using 'large' instead of 'substantial'.
use/alerting/kubernetes-monitors.md|54 col 1127| [Microsoft.Contractions] Use 'it's' instead of 'it is'.
use/alerting/kubernetes-monitors.md|54 col 1430| [Vale.Spelling] Did you really mean 'autoscaling'?
use/alerting/kubernetes-monitors.md|54 col 1498| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|54 col 1621| [Microsoft.ComplexWords] Consider using 'keep' or 'support' instead of 'maintain'.
use/alerting/kubernetes-monitors.md|54 col 1632| [Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.
use/alerting/kubernetes-monitors.md|54 col 1778| [Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.
use/alerting/kubernetes-monitors.md|54 col 1821| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|55 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|57 col 5| [Microsoft.Headings] 'Node PID Pressure' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|57 col 10| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|57 col 10| [Microsoft.HeadingAcronyms] Avoid using acronyms in a title or heading.
use/alerting/kubernetes-monitors.md|59 col 6| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 69| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 162| [Vale.Spelling] Did you really mean 'misconfigured'?
use/alerting/kubernetes-monitors.md|59 col 278| [Microsoft.ComplexWords] Consider using 'right' or 'exact' instead of 'accurate'.
use/alerting/kubernetes-monitors.md|59 col 331| [Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.
use/alerting/kubernetes-monitors.md|59 col 376| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|59 col 416| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 456| [Vale.Spelling] Did you really mean 'PIDs'?
use/alerting/kubernetes-monitors.md|59 col 494| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 545| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 686| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 690| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 718| [Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.
use/alerting/kubernetes-monitors.md|59 col 763| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 860| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 877| [Microsoft.ComplexWords] Consider using 'discuss' instead of 'address'.
use/alerting/kubernetes-monitors.md|59 col 890| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 904| [Microsoft.Contractions] Use 'it's' instead of 'it is'.
use/alerting/kubernetes-monitors.md|59 col 988| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|59 col 1022| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1079| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1145| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1203| [Vale.Spelling] Did you really mean 'autoscaling'?
use/alerting/kubernetes-monitors.md|59 col 1265| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1269| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 1314| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1363| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1407| [Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.
use/alerting/kubernetes-monitors.md|59 col 1424| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1455| [Microsoft.Contractions] Use 'it's' instead of 'It is'.
use/alerting/kubernetes-monitors.md|59 col 1560| [Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.
use/alerting/kubernetes-monitors.md|59 col 1600| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|60 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|62 col 5| [Microsoft.Headings] 'Node Readiness' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|65 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|71 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|73 col 5| [Microsoft.Headings] 'Pod Ready State' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|107 col 5| [Vale.Spelling] Did you really mean 'Unschedulable'?
use/alerting/kubernetes-monitors.md|107 col 5| [Microsoft.Headings] 'Unschedulable Node' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|109 col 1| [Microsoft.SentenceLength] Try to keep sentences short (< 30 words).
use/alerting/kubernetes-monitors.md|109 col 8| [Microsoft.ComplexWords] Consider using 'meet' instead of 'encounter'.
use/alerting/kubernetes-monitors.md|109 col 232| [Microsoft.Contractions] Use 'can't' instead of 'cannot'.

description: StackState Kubernetes Troubleshooting
---

# Override monitor threshold arguments via kubernetes annotations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'kubernetes'?


## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'give' or 'offer' instead of 'provide'.


## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).


## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.We] Try to avoid using first-person plural like 'we'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like vale says:)

## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.
The mechanism to declare the overrides is via kubernetes resource annotations that denote to which monitor and component they should apply. For example we could override the `failureState` for the `Available service endpoints` monitor for a specific service where we want to signal a `CRITICAL` state when it fails rather than the default `DEVIATING`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'kubernetes'?


### Daemonset desired replicas

It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Daemonsets'?


### Daemonset desired replicas

It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'isn't' instead of 'is not'.


### Deployment desired replicas

It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'it's' instead of 'It is'.


### Deployment desired replicas

It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'same' instead of 'identical'.


### Deployment desired replicas

It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'keep' or 'support' instead of 'maintain'.


The override annotations keys for StackState monitors follow the following convention:
```
monitor.${owner}.stackstate.io/${monitorShorName}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitorShowName=monitorShortName


### Node PID Pressure

Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this copy-paste from the descriptions? if so, no more review needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I took it directly from the descriptions on master

* [Node Memory Pressure](/use/alerting/kubernetes-monitors.md#node-memory-pressure)
* [Node PID Pressure](/use/alerting/kubernetes-monitors.md#node-pid-pressure)
* [Node Readiness](/use/alerting/kubernetes-monitors.md#node-readiness)
* [Out of memory for containers](/use/alerting/kubernetes-monitors.md#out-of-memory-for-containers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a last short section how to do an override on a custom monitor, to have that example in aswell

@craffit craffit self-requested a review August 24, 2023 07:38
Copy link
Contributor

@craffit craffit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, looking good

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

vale

use/alerting/kubernetes-monitors.md|59 col 278| [Microsoft.ComplexWords] Consider using 'right' or 'exact' instead of 'accurate'.
use/alerting/kubernetes-monitors.md|59 col 331| [Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.
use/alerting/kubernetes-monitors.md|59 col 376| [Microsoft.Contractions] Use 'aren't' instead of 'are not'.
use/alerting/kubernetes-monitors.md|59 col 416| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 456| [Vale.Spelling] Did you really mean 'PIDs'?
use/alerting/kubernetes-monitors.md|59 col 494| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 545| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 686| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 690| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 718| [Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.
use/alerting/kubernetes-monitors.md|59 col 763| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 860| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 877| [Microsoft.ComplexWords] Consider using 'discuss' instead of 'address'.
use/alerting/kubernetes-monitors.md|59 col 890| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 904| [Microsoft.Contractions] Use 'it's' instead of 'it is'.
use/alerting/kubernetes-monitors.md|59 col 988| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|59 col 1022| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1079| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1145| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1203| [Vale.Spelling] Did you really mean 'autoscaling'?
use/alerting/kubernetes-monitors.md|59 col 1265| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1269| [Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.
use/alerting/kubernetes-monitors.md|59 col 1314| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1363| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1407| [Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.
use/alerting/kubernetes-monitors.md|59 col 1424| [Microsoft.Acronyms] 'PID' has no definition.
use/alerting/kubernetes-monitors.md|59 col 1455| [Microsoft.Contractions] Use 'it's' instead of 'It is'.
use/alerting/kubernetes-monitors.md|59 col 1560| [Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.
use/alerting/kubernetes-monitors.md|59 col 1600| [Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.
use/alerting/kubernetes-monitors.md|60 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|62 col 5| [Microsoft.Headings] 'Node Readiness' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|65 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|71 col 1| [Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.
use/alerting/kubernetes-monitors.md|73 col 5| [Microsoft.Headings] 'Pod Ready State' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|107 col 5| [Microsoft.Headings] 'Unschedulable Node' should use sentence-style capitalization.
use/alerting/kubernetes-monitors.md|107 col 5| [Vale.Spelling] Did you really mean 'Unschedulable'?
use/alerting/kubernetes-monitors.md|109 col 1| [Microsoft.SentenceLength] Try to keep sentences short (< 30 words).
use/alerting/kubernetes-monitors.md|109 col 8| [Microsoft.ComplexWords] Consider using 'meet' instead of 'encounter'.
use/alerting/kubernetes-monitors.md|109 col 232| [Microsoft.Contractions] Use 'can't' instead of 'cannot'.


## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes there's need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'give' or 'offer' instead of 'provide'.


## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes there's need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).


## Build an override for a custom monitor

Any custom threshold monitor created using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) is suitable to override arguments, as [the example shows](/use/alerting/k8s-add-monitors-cli.md#write-the-outline-of-the-monitor) an identifier for a custom monitor is structured as `urn:custom:monitor:{monitorShortName}`and the override annotation key for such an identifier is expected to be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).


### Deployment desired replicas

It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'isn't' instead of 'is not'.

@@ -34,12 +44,33 @@ It is important to monitor the usage of Persistent Volume Claims (PVCs) in your
It is important to monitor the usage of Persistent Volume Claims (PVCs) in your Kubernetes cluster over time. PVCs are used to store data that needs to persist beyond the lifetime of a container, and it's crucial to ensure that they have enough space to store the data.
To track this, StackState set up a check that uses linear prediction to forecast the Kubernetes volume usage trend over a 4-day period. If the trend indicates that the PVCs will run out of space within this time frame, you will receive a notification, allowing you to take action to prevent data loss or downtime.

### Node Disk Pressure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Headings] 'Node Disk Pressure' should use sentence-style capitalization.

Node memory pressure refers to a situation where the memory resources on a Kubernetes node are excessively strained. While encountering node memory pressure is uncommon due to Kubernetes' built-in resource management mechanisms, it can still occur under specific circumstances. There are two primary reasons why node memory pressure may arise. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on resource requests and limits to allocate and manage resources effectively. If containers are not accurately configured with their memory requirements, they may consume more memory than expected, leading to node memory pressure. The second reason involves the presence of memory-intensive applications or processes. Certain workloads or applications may have higher memory demands, resulting in increased memory utilization on the node. If multiple pods or containers with substantial memory requirements are scheduled on the same node without proper resource allocation, it can cause memory pressure. To mitigate node memory pressure, it is crucial to review and adjust resource requests and limits for containers, ensuring they align with the actual memory needs of the applications. Monitoring and optimizing memory usage within the applications themselves can also help reduce memory consumption. Additionally, consider horizontal pod autoscaling to dynamically scale the number of pods based on memory utilization. Regular monitoring, analysis of memory-related metrics, and proactive allocation of memory resources can help maintain a healthy memory state on Kubernetes nodes. It's essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent memory pressure and ensure optimal performance.
Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

### Node PID Pressure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.HeadingAcronyms] Avoid using acronyms in a title or heading.

Node memory pressure refers to a situation where the memory resources on a Kubernetes node are excessively strained. While encountering node memory pressure is uncommon due to Kubernetes' built-in resource management mechanisms, it can still occur under specific circumstances. There are two primary reasons why node memory pressure may arise. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on resource requests and limits to allocate and manage resources effectively. If containers are not accurately configured with their memory requirements, they may consume more memory than expected, leading to node memory pressure. The second reason involves the presence of memory-intensive applications or processes. Certain workloads or applications may have higher memory demands, resulting in increased memory utilization on the node. If multiple pods or containers with substantial memory requirements are scheduled on the same node without proper resource allocation, it can cause memory pressure. To mitigate node memory pressure, it is crucial to review and adjust resource requests and limits for containers, ensuring they align with the actual memory needs of the applications. Monitoring and optimizing memory usage within the applications themselves can also help reduce memory consumption. Additionally, consider horizontal pod autoscaling to dynamically scale the number of pods based on memory utilization. Regular monitoring, analysis of memory-related metrics, and proactive allocation of memory resources can help maintain a healthy memory state on Kubernetes nodes. It's essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent memory pressure and ensure optimal performance.
Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

### Node PID Pressure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.


### Node PID Pressure

Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.


### Node PID Pressure

Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.


### Node PID Pressure

Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'misconfigured'?

@aacevedoosorio aacevedoosorio merged commit a6cb4c1 into k8s-troubleshooting Aug 29, 2023
1 check passed
@aacevedoosorio aacevedoosorio deleted the stac-19922 branch September 1, 2023 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants