apache-airflow
Overview
Jsonnet source code is available at github.com/grafana/jsonnet-libs
Alerts
Complete list of pregenerated alerts is available here.
apache-airflow
ApacheAirflowStarvingPoolTasks
alert: ApacheAirflowStarvingPoolTasks
annotations:
description: |
The number of starved tasks is {{ printf "%.0f" $value }} over the last 5m on {{ $labels.instance }} - {{ $labels.pool_name }} which is above the threshold of 0.
summary: There are starved tasks detected in the Apache Airflow pool.
expr: |
airflow_pool_starving_tasks > 0
for: 5m
labels:
severity: critical
ApacheAirflowDAGScheduleDelayWarningLevel
alert: ApacheAirflowDAGScheduleDelayWarningLevel
annotations:
description: |
The average delay in DAG schedule to run time is {{ printf "%.0f" $value }} over the last 1m on {{ $labels.instance }} - {{ $labels.dag_id }} which is above the threshold of 10.
summary: The delay in DAG schedule time to DAG run time has reached the warning
threshold.
expr: |
increase(airflow_dagrun_schedule_delay_sum[5m]) / clamp_min(increase(airflow_dagrun_schedule_delay_count[5m]),1) > 10
for: 1m
labels:
severity: warning
ApacheAirflowDAGScheduleDelayCriticalLevel
alert: ApacheAirflowDAGScheduleDelayCriticalLevel
annotations:
description: |
The average delay in DAG schedule to run time is {{ printf "%.0f" $value }} over the last 1m for {{ $labels.instance }} - {{ $labels.dag_id }} which is above the threshold of 60.
summary: The delay in DAG schedule time to DAG run time has reached the critical
threshold.
expr: |
increase(airflow_dagrun_schedule_delay_sum[5m]) / clamp_min(increase(airflow_dagrun_schedule_delay_count[5m]),1) > 60
for: 1m
labels:
severity: critical
ApacheAirflowDAGFailures
alert: ApacheAirflowDAGFailures
annotations:
description: |
The number of DAG failures seen is {{ printf "%.0f" $value }} over the last 1m for {{ $labels.instance }} - {{ $labels.dag_id }} which is above the threshold of 0.
summary: There have been DAG failures detected.
expr: |
increase(airflow_dagrun_duration_failed_count[5m]) > 0
for: 1m
labels:
severity: critical
Dashboards
Following dashboards are generated from mixins and hosted on github: