Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document monitor threshold overrides via annotations #1423

Merged
merged 2 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
* [Alerts](use/alerting/event-handlers.md)
* [Advanced](use/metrics/k8s-advanced.md)
* [Add a monitor using the CLI](use/alerting/k8s-add-monitors-cli.md)
* [Override monitor arguments](use/alerting/k8s-override-monitor-arguments.md)

## 📈 Metrics

Expand Down
2 changes: 1 addition & 1 deletion use/alerting/k8s-add-monitors-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ nodes:
urnTemplate: "urn:kubernetes:/${kube_cluster_name}:${namespace}:service/${endpoint}"
titleTemplate:
description:
function: {{ get "urn:stackpack:kubernetes-v2:shared:monitor-function:threshold" }}
function: {{ get "urn:stackpack:common:monitor-function:threshold" }}
aacevedoosorio marked this conversation as resolved.
Show resolved Hide resolved
identifier: urn:custom:monitor:...
intervalSeconds: 30
name:
Expand Down
59 changes: 59 additions & 0 deletions use/alerting/k8s-override-monitor-arguments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
description: StackState Kubernetes Troubleshooting
---

# Override monitor threshold arguments via kubernetes annotations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'kubernetes'?


## Overview

StackState provides [monitors out of the box](/use/alerting/k8s-monitors.md), which provide monitoring on common issues that can occur in a Kubernetes cluster. Those monitors work with certain default arguments that suit most of the use cases but sometimes we need to adapt its behaviour by overriding some of such default arguments like `threshold` or `failureState`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'give' or 'offer' instead of 'provide'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.We] Try to avoid using first-person plural like 'we'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like vale says:)

The mechanism to declare the overrides is via kubernetes resource annotations that denote to which monitor and component they should apply. For example we could override the `failureState` for the `Available service endpoints` monitor for a specific service where we want to signal a `CRITICAL` state when it fails rather than the default `DEVIATING`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'kubernetes'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.We] Try to avoid using first-person plural like 'we'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.We] Try to avoid using first-person plural like 'we'.


## How to

* [How to build an override annotation](#how-to-build-an-override-annotation)
* [What monitors allow overriding arguments?](#what-monitor-allows-overriding)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'allow' with the A-Z word list.


As an example the steps will override the arguments for the `Available service endpoints` monitor of Kubernetes HTTP services.

## How to build my annotation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.FirstPerson] Use first person (such as 'my') sparingly.


The override annotations keys for StackState monitors follow the following convention:
```
monitor.${owner}.stackstate.io/${monitorShorName}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitorShowName=monitorShortName

```
The `owner` property represents who created such a monitor, for the out of the box monitors is `kubernetes-v2`, and the `monitorShorName` property represents the id of the monitor and can be extracted from the `identifier` property of a monitor which can be read from the cli when listing or inspecting monitors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[StackState.AvoidThis] Use'ID' instead of 'id'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Terms] Use 'CLI' instead of 'cli'.

```
sts monitor list

ID | STATUS | IDENTIFIER | NAME | FUNCTION ID | TAGS
8051105457030 | ENABLED | urn:stackpack:kubernetes-v2:shared:monitor:kubernetes-v2:service-available-endpoint | Available service endpoints | 233276809885571 | [services]
```

In our example the identifier is `urn:stackpack:kubernetes-v2:shared:monitor:kubernetes-v2:service-available-endpoint` and the `monitorShorName` corresponds to the very last segment as in `service-available-endpoint` therefore the annotation key is:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.We] Try to avoid using first-person plural like 'our'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.Adverbs] Consider removing 'very'.

```bash
monitor.kubernetes-v2.stackstate.io/service-available-endpoint
```

the annotation payload is a JSON object where the following optional arguments can be defined:
* `threshold`: optional.A numeric threshold to compare against.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Spacing] 'l.A' should have one space.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'against' with the A-Z word list.

* `failureState`: optional. Either "CRITICAL" or "DEVIATING". "CRITICAL" will show as read in StackState and "DEVIATING" as orange, to denote different severity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Quotes] Punctuation should be inside the quotes.

* `enabled`: optional. Boolean that determines if the monitor would produce a health state for that component.

The full annotation then would look like
```bash
monitor.kubernetes-v2.stackstate.io/service-available-endpoint: |-
{
"threshold": 0.0,
"failureState": "CRITICAL"
"enabled": true
}
```

## What monitors allow overriding arguments?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'allow' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.HeadingPunctuation] Don't use end punctuation in headings.

* [Available service endpoints](/use/alerting/kubernetes-monitors.md#available-service-endpoints)
* [Node Disk Pressure](/use/alerting/kubernetes-monitors.md#node-disk-pressure)
* [Node Memory Pressure](/use/alerting/kubernetes-monitors.md#node-memory-pressure)
* [Node PID Pressure](/use/alerting/kubernetes-monitors.md#node-pid-pressure)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.

* [Node Readiness](/use/alerting/kubernetes-monitors.md#node-readiness)
* [Out of memory for containers](/use/alerting/kubernetes-monitors.md#out-of-memory-for-containers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a last short section how to do an override on a custom monitor, to have that example in aswell

39 changes: 37 additions & 2 deletions use/alerting/kubernetes-monitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,20 @@ This section describes the out-of-the-box monitors delivered with StackState. Mo

## Out of the box Kubernetes monitors

### Available service endpoints
### Available service endpoints

It is important to ensure that your services are available and accessible to users. To monitor this, StackState has set up a check that verifies if a service has at least one endpoint available. Endpoints are network addresses that enable communication between different components in a distributed system, and they need to be available for the service to function properly.
If there is an occurrence of zero endpoints available within the last 10 minutes, the monitor will remain deviating, indicating that there may be an issue with the service that needs to be addressed.
Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.


### Daemonset desired replicas

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Daemonset'?


It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'it's' instead of 'It is'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Daemonset'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.SentenceLength] Try to keep sentences short (< 30 words).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Daemonsets'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'isn't' instead of 'is not'.


### Deployment desired replicas

It is important that the desired number of replicas for a Deployments is being met. Deployments are used to manage the deployment and scaling of a set of identical Pods in a Kubernetes cluster. By ensuring that the desired number of replicas is running and available, Deployments can help maintain the availability and reliability of a Kubernetes application or service. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to Deployments that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the Deployments. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'it's' instead of 'It is'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'same' instead of 'identical'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'keep' or 'support' instead of 'maintain'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'isn't' instead of 'is not'.



### HTTP - 5xx error ratio

Expand All @@ -34,12 +44,33 @@ It is important to monitor the usage of Persistent Volume Claims (PVCs) in your
It is important to monitor the usage of Persistent Volume Claims (PVCs) in your Kubernetes cluster over time. PVCs are used to store data that needs to persist beyond the lifetime of a container, and it's crucial to ensure that they have enough space to store the data.
To track this, StackState set up a check that uses linear prediction to forecast the Kubernetes volume usage trend over a 4-day period. If the trend indicates that the PVCs will run out of space within this time frame, you will receive a notification, allowing you to take action to prevent data loss or downtime.

### Node Disk Pressure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Headings] 'Node Disk Pressure' should use sentence-style capitalization.


Node disk pressure refers to a situation where the disks connected to a node experience excessive strain. While encountering node disk pressure is unlikely due to Kubernetes' built-in preventive measures, it can still occur sporadically. There are two primary reasons why node disk pressure may arise. The first reason relates to Kubernetes failing to clean up unused images. Under normal circumstances, Kubernetes regularly checks for and deletes any images that are not in use. Therefore, this is an uncommon cause of node disk pressure, but it should be acknowledged. The more probable issue involves the accumulation of logs. In Kubernetes, logs are typically saved in two scenarios: when containers are running and when the most recently exited container's logs are retained for troubleshooting purposes. This approach aims to strike a balance between preserving important logs and discarding unnecessary ones over time. However, if a long-running container generates an extensive volume of logs, they may accumulate to the point where they overload the node disk's capacity. To understand the full monitor definition check the details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Accessibility] Don't use language (such as 'normal') that defines people by their disability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.Adverbs] Consider removing 'regularly'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'aren't' instead of 'are not'.

Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.


### Node Memory Pressure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Headings] 'Node Memory Pressure' should use sentence-style capitalization.


Node memory pressure refers to a situation where the memory resources on a Kubernetes node are excessively strained. While encountering node memory pressure is uncommon due to Kubernetes' built-in resource management mechanisms, it can still occur under specific circumstances. There are two primary reasons why node memory pressure may arise. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on resource requests and limits to allocate and manage resources effectively. If containers are not accurately configured with their memory requirements, they may consume more memory than expected, leading to node memory pressure. The second reason involves the presence of memory-intensive applications or processes. Certain workloads or applications may have higher memory demands, resulting in increased memory utilization on the node. If multiple pods or containers with substantial memory requirements are scheduled on the same node without proper resource allocation, it can cause memory pressure. To mitigate node memory pressure, it is crucial to review and adjust resource requests and limits for containers, ensuring they align with the actual memory needs of the applications. Monitoring and optimizing memory usage within the applications themselves can also help reduce memory consumption. Additionally, consider horizontal pod autoscaling to dynamically scale the number of pods based on memory utilization. Regular monitoring, analysis of memory-related metrics, and proactive allocation of memory resources can help maintain a healthy memory state on Kubernetes nodes. It's essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent memory pressure and ensure optimal performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'misconfigured'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'assign' or 'divide' instead of 'allocate'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'aren't' instead of 'are not'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'many' instead of 'multiple'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'large' instead of 'substantial'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'it's' instead of 'it is'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'autoscaling'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'use' instead of 'utilization'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'keep' or 'support' instead of 'maintain'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Accessibility] Don't use language (such as 'healthy') that defines people by their disability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.ComplexWords] Consider using 'so' instead of 'accordingly'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.

Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'Allows' with the A-Z word list.


### Node PID Pressure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Headings] 'Node PID Pressure' should use sentence-style capitalization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Microsoft.HeadingAcronyms] Avoid using acronyms in a title or heading.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.


Node PID pressure occurs when the available process identification (PID) resources on a Kubernetes node are excessively strained. The first reason is related to misconfigured or insufficient resource requests and limits for containers running on the node. Kubernetes relies on accurate resource requests and limits to effectively allocate and manage resources. If containers are not configured correctly with their PID requirements, they may consume more PIDs than expected, resulting in node PID pressure. The second reason is the presence of PID-intensive applications or processes. Some workloads or applications have higher demands for process identification, leading to increased PID utilization on the node. If multiple pods or containers with significant PID requirements are scheduled on the same node without proper resource allocation, it can cause PID pressure. To address node PID pressure, it is important to review and adjust resource requests and limits for containers to ensure they align with the actual PID needs of the applications. Monitoring and optimizing PID usage within the applications themselves can also help reduce PID consumption. Additionally, considering horizontal pod autoscaling can dynamically scale the number of pods based on PID utilization. Regular monitoring, analysis of PID-related metrics, and proactive allocation of PID resources are crucial for maintaining a healthy state of PID usage on Kubernetes nodes. It is essential to understand the specific requirements of your workloads and adjust resource allocation accordingly to prevent PID pressure and ensure optimal performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this copy-paste from the descriptions? if so, no more review needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I took it directly from the descriptions on master

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Acronyms] 'PID' has no definition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'misconfigured'?

Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

### Node Readiness

Check if the Node is up and running as expected.
Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

### Out of memory for containers

It is important to ensure that the containers running in your Kubernetes cluster have enough memory to function properly. Out-of-memory (OOM) conditions can cause containers to crash or become unresponsive, leading to restarts and potential data loss.
To monitor for these conditions, StackState set up a check that detects and reports OOM events in the containers running in the cluster. This check will help you identify any containers that are running out of memory and allow you to take action to prevent issues before they occur.
Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

### Pod Readiness
### Pod Ready State

Checks if a Pod that has been scheduled is running and ready to receive traffic within the expected amount of time.

Expand Down Expand Up @@ -73,6 +104,10 @@ To monitor this, StackState has set up a check that verifies if the available re
- If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all.


### Unschedulable Node

If you encounter a "NodeNotSchedulable" event in Kubernetes, it means that the Kubernetes scheduler was unable to place a pod on a specific node due to some constraints or issues with the node. This event occurs when the scheduler cannot find a suitable node to run the pod according to its resource requirements and other constraints.

## See also

* [Monitors](/use/alerting/k8s-monitors.md)