velero


Overview

Jsonnet source code is available at github.com/grafana/jsonnet-libs

Alerts

Complete list of pregenerated alerts is available here.

velero

VeleroBackupFailure

alert: VeleroBackupFailure
annotations:
  description: |
    Backup failures detected on {{ $labels.instance }}. This could lead to data loss or inability to recover in case of a disaster.
  summary: Velero backup failures detected.
expr: |
  increase(velero_backup_failure_total{job=~"integrations/velero"}[5m]) > 0
for: 5m
labels:
  severity: critical

VeleroHighBackupDuration

alert: VeleroHighBackupDuration
annotations:
  description: |
    Backup duration on {{ $labels.instance }} is higher than the average duration over the past 48 hours. This could indicate performance issues or network congestion. The current value is {{ $value | printf "%.2f" }} seconds.
  summary: Velero backups taking longer than usual.
expr: |
  histogram_quantile(0.5, sum(rate(velero_backup_duration_seconds_bucket{job=~"integrations/velero"}[5m])) by (le, schedule)) > 1.2 * 1.2 * avg_over_time(histogram_quantile(0.5, sum(rate(velero_backup_duration_seconds_bucket{job=~"integrations/velero"}[48h])) by (le, schedule))[5m:])
for: 5m
labels:
  severity: warning

VeleroHighRestoreFailureRate

alert: VeleroHighRestoreFailureRate
annotations:
  description: |
    Restore failures detected on {{ $labels.instance }}. This could prevent timely data recovery and business continuity.
  summary: Velero restore failures detected.
expr: |
  increase(velero_restore_failed_total{job=~"integrations/velero"}[5m]) > 0
for: 5m
labels:
  severity: critical

VeleroUpStatus

alert: VeleroUpStatus
annotations:
  description: "Cannot find any metrics related to Velero on {{ $labels.instance }}.
    This may indicate further issues with Velero or the scraping agent. 
"
  summary: Velero is down.
expr: |
  up{job=~"integrations/velero"} != 0
for: 5m
labels:
  severity: critical

Dashboards

Following dashboards are generated from mixins and hosted on github: