Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stac 19900 #1425

Merged
merged 2 commits into from
Oct 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion use/alerting/k8s-override-monitor-arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,17 @@ The full annotation then would look like
monitor.kubernetes-v2.stackstate.io/service-available-endpoint: |-
{
"threshold": 0.0,
"failureState": "CRITICAL"
"failureState": "CRITICAL",
"enabled": true
}
```

## What monitors allow overriding arguments?
* [Available service endpoints](/use/alerting/kubernetes-monitors.md#available-service-endpoints)
* [Cpu limits resourcequota](/use/alerting/kubernetes-monitors.md#cpu-limits-resourcequota)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

* [Cpu requests resourcequota](/use/alerting/kubernetes-monitors.md#cpu-requests-resourcequota)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

* [Memory limits resourcequota](/use/alerting/kubernetes-monitors.md#memory-limits-resourcequota)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

* [Memory requests resourcequota](/use/alerting/kubernetes-monitors.md#memory-requests-resourcequota)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

* [Node Disk Pressure](/use/alerting/kubernetes-monitors.md#node-disk-pressure)
* [Node Memory Pressure](/use/alerting/kubernetes-monitors.md#node-memory-pressure)
* [Node PID Pressure](/use/alerting/kubernetes-monitors.md#node-pid-pressure)
Expand Down
13 changes: 13 additions & 0 deletions use/alerting/kubernetes-monitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@ It is important to ensure that your services are available and accessible to use
If there is an occurrence of zero endpoints available within the last 10 minutes, the monitor will remain deviating, indicating that there may be an issue with the service that needs to be addressed.
Allows [Override Monitor arguments](/use/alerting/k8s-override-monitor-arguments.md)

### Cpu limits resourcequota

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?


Users create resources (pods, services, etc.) in the namespace, and the quota system tracks usage to ensure it does not exceed hard resource limits for Cpu defined in a ResourceQuota. The monitor will alert when the total Cpu limits in the namespace gets to 90% or more of the established by the quota. Each `resourcequota` in the namespace produces a monitor health state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'doesn't' instead of 'does not'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'alert' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?


### Cpu requests resourcequota

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

Users create resources (pods, services, etc.) in the namespace, and the quota system tracks usage to ensure it does not exceed hard resource requests for Cpu defined in a ResourceQuota. The monitor will alert when the total Cpu requests in the namespace gets to 90% or more of the established by the quota. Each `resourcequota` in the namespace produces a monitor health state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'doesn't' instead of 'does not'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'alert' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'Cpu'?


### Daemonset desired replicas

It is important that the desired number of replicas for a Daemonset is being met. Daemonsets are used to manage a set of pods that need to run on all or a subset of nodes in a cluster, ensuring that a copy of the pod is running on each node that meets the specified criteria. This is useful for tasks such as logging, monitoring, and other cluster-level tasks that need to be executed on every node in the cluster. To monitor this, StackState has set up a check that verifies if the available replicas match the desired number of replicas. This check will only be applied to DaemonSets that have a desired number of replicas greater than zero. - If the number of available replicas is less than the desired number, the monitor will signal a DEVIATING health state, indicating that there may be an issue with the StatefulSet. - If the number of available replicas is zero, the monitor will signal a CRITICAL health state, indicating that the StatefulSet is not functioning at all. To understand the full monitor definition check the details.
Expand Down Expand Up @@ -44,6 +51,12 @@ It is important to monitor the usage of Persistent Volume Claims (PVCs) in your
It is important to monitor the usage of Persistent Volume Claims (PVCs) in your Kubernetes cluster over time. PVCs are used to store data that needs to persist beyond the lifetime of a container, and it's crucial to ensure that they have enough space to store the data.
To track this, StackState set up a check that uses linear prediction to forecast the Kubernetes volume usage trend over a 4-day period. If the trend indicates that the PVCs will run out of space within this time frame, you will receive a notification, allowing you to take action to prevent data loss or downtime.

### Memory limits resourcequota

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

Users create resources (pods, services, etc.) in the namespace, and the quota system tracks usage to ensure it does not exceed hard resource limits for memory defined in a ResourceQuota. The monitor will alert when the total memory limits in the namespace gets to 90% or more of the established by the quota. Each `resourcequota` in the namespace produces a monitor health state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'doesn't' instead of 'does not'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'alert' with the A-Z word list.


### Memory requests resourcequota

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resourcequota'?

Users create resources (pods, services, etc.) in the namespace, and the quota system tracks usage to ensure it does not exceed hard resource requests for memory defined in a ResourceQuota. The monitor will alert when the total memory requests in the namespace gets to 90% or more of the established by the quota. Each `resourcequota` in the namespace produces a monitor health state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'ensure' with the A-Z word list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Microsoft.Contractions] Use 'doesn't' instead of 'does not'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Microsoft.Vocab] Verify your use of 'alert' with the A-Z word list.


### Node Disk Pressure

Node disk pressure refers to a situation where the disks connected to a node experience excessive strain. While encountering node disk pressure is unlikely due to Kubernetes' built-in preventive measures, it can still occur sporadically. There are two primary reasons why node disk pressure may arise. The first reason relates to Kubernetes failing to clean up unused images. Under normal circumstances, Kubernetes regularly checks for and deletes any images that are not in use. Therefore, this is an uncommon cause of node disk pressure, but it should be acknowledged. The more probable issue involves the accumulation of logs. In Kubernetes, logs are typically saved in two scenarios: when containers are running and when the most recently exited container's logs are retained for troubleshooting purposes. This approach aims to strike a balance between preserving important logs and discarding unnecessary ones over time. However, if a long-running container generates an extensive volume of logs, they may accumulate to the point where they overload the node disk's capacity. To understand the full monitor definition check the details.
Expand Down