kubernetes-autoscaling
Overview
Jsonnet source code is available at github.com/adinhodovic/kubernetes-autoscaling-mixin
Alerts
Complete list of pregenerated alerts is available here.
karpenter
KarpenterCloudProviderErrors
alert: KarpenterCloudProviderErrors
annotations:
dashboard_url: https://grafana.com/d/kubernetes-autoscaling-mixin-kperf-jkwq/kubernetes-autoscaling-karpenter-performance
description: The Karpenter provider {{ $labels.provider }} with the controller {{
$labels.controller }} has errors with the method {{ $labels.method }}.
summary: Karpenter has Cloud Provider Errors.
expr: |
sum(
increase(
karpenter_cloudprovider_errors_total{
job=~"karpenter"
}[5m]
)
) by (namespace, job, provider, controller, method) > 0
for: 5m
labels:
severity: warning
KarpenterNodepoolNearCapacity
alert: KarpenterNodepoolNearCapacity
annotations:
dashboard_url: https://grafana.com/d/kubernetes-autoscaling-mixin-kover-jkwq/kubernetes-autoscaling-karpenter-overview
description: The resource {{ $labels.resource_type }} in the Karpenter node pool
{{ $labels.nodepool }} is nearing its limit. Consider scaling or adding resources.
summary: Karpenter Nodepool near capacity.
expr: |
sum (
karpenter_nodepools_usage{job=~"karpenter"}
) by (namespace, job, nodepool, resource_type)
/
sum (
karpenter_nodepools_limit{job=~"karpenter"}
) by (namespace, job, nodepool, resource_type)
* 100 > 75
for: 15m
labels:
severity: warning
cluster-autoscaler
ClusterAutoscalerNodeCountNearCapacity
alert: ClusterAutoscalerNodeCountNearCapacity
annotations:
dashboard_url: https://grafana.com/d/kubernetes-autoscaling-mixin-ca-jkwq/kubernetes-autoscaling-cluster-autoscaler
description: The node count for the cluster autoscaler job {{ $labels.job }} is
reaching max limit. Consider scaling node groups.
summary: Cluster Autoscaler Node Count near Capacity.
expr: |
sum (
cluster_autoscaler_nodes_count{job=~"cluster-autoscaler"}
) by (namespace, job)
/
sum (
cluster_autoscaler_max_nodes_count{job=~"cluster-autoscaler"}
) by (namespace, job)
* 100 > 75
for: 15m
labels:
severity: warning
ClusterAutoscalerUnschedulablePods
alert: ClusterAutoscalerUnschedulablePods
annotations:
dashboard_url: https://grafana.com/d/kubernetes-autoscaling-mixin-ca-jkwq/kubernetes-autoscaling-cluster-autoscaler
description: The cluster currently has unschedulable pods, indicating resource shortages.
Consider adding more nodes or increasing node group capacity.
summary: Pods Pending Scheduling - Cluster Node Group Scaling Required
expr: |
sum (
cluster_autoscaler_unschedulable_pods_count{job=~"cluster-autoscaler"}
) by (namespace, job)
> 0
for: 15m
labels:
severity: warning
Dashboards
Following dashboards are generated from mixins and hosted on github: