influxdb

Overview

Jsonnet source code is available at github.com/grafana/jsonnet-libs

Alerts

Complete list of pregenerated alerts is available here.

influxdb

InfluxDBWarningTaskSchedulerHighFailureRate

alert: InfluxDBWarningTaskSchedulerHighFailureRate
annotations:
  description: Task scheduler task executions for instance {{$labels.instance}} on
    cluster {{$labels.influxdb_cluster}} are failing at a rate of {{ printf "%.0f"
    $value }} percent, which is above the threshold of 25 percent.
  summary: Automated data processing tasks are failing at a high rate.
expr: |
  100 * rate(task_scheduler_total_execute_failure[5m])/clamp_min(rate(task_scheduler_total_execution_calls[5m]), 1) >= 25
for: 5m
labels:
  severity: warning

InfluxDBCriticalTaskSchedulerHighFailureRate

alert: InfluxDBCriticalTaskSchedulerHighFailureRate
annotations:
  description: Task scheduler task executions for instance {{$labels.instance}} on
    cluster {{$labels.influxdb_cluster}} are failing at a rate of {{ printf "%.0f"
    $value }} percent, which is above the threshold of 50 percent.
  summary: Automated data processing tasks are failing at a critical rate.
expr: |
  100 * rate(task_scheduler_total_execute_failure[5m])/clamp_min(rate(task_scheduler_total_execution_calls[5m]), 1) >= 50
for: 5m
labels:
  severity: critical

InfluxDBHighBusyWorkerPercentage

alert: InfluxDBHighBusyWorkerPercentage
annotations:
  description: The busy worker percentage for instance {{$labels.instance}} on cluster
    {{$labels.influxdb_cluster}} is {{ printf "%.0f" $value }} percent, which is above
    the threshold of 80 percent.
  summary: There is a high percentage of busy workers.
expr: |
  task_executor_workers_busy >= 80
for: 5m
labels:
  severity: critical

InfluxDBHighHeapMemoryUsage

alert: InfluxDBHighHeapMemoryUsage
annotations:
  description: The heap memory usage for instance {{$labels.instance}} on cluster
    {{$labels.influxdb_cluster}} is {{ printf "%.0f" $value }} percent, which is above
    the threshold of 80 percent.
  summary: There is a high amount of heap memory being used.
expr: |
  100 * go_memstats_heap_alloc_bytes/clamp_min((go_memstats_heap_idle_bytes + go_memstats_heap_alloc_bytes), 1) >= 80
for: 5m
labels:
  severity: critical

InfluxDBHighAverageAPIRequestLatency

alert: InfluxDBHighAverageAPIRequestLatency
annotations:
  description: The average API request latency for instance {{$labels.instance}} on
    cluster {{$labels.influxdb_cluster}} is {{ printf "%.2f" $value }} seconds, which
    is above the threshold of 0.29999999999999999 seconds.
  summary: Average API request latency is too high. High latency will negatively affect
    system performance, degrading data availability and precision.
expr: |
  sum without(handler, method, path, response_code, status, user_agent) (increase(http_api_request_duration_seconds_sum[5m])/clamp_min(increase(http_api_requests_total[5m]), 1)) >= 0.29999999999999999
for: 1m
labels:
  severity: critical

InfluxDBSlowAverageIQLExecutionTime

alert: InfluxDBSlowAverageIQLExecutionTime
annotations:
  description: The average InfluxQL query execution time for instance {{$labels.instance}}
    on cluster {{$labels.influxdb_cluster}} is {{ printf "%.2f" $value }} seconds,
    which is above the threshold of 0.10000000000000001 seconds.
  summary: InfluxQL execution times are too slow. Slow query execution times will
    negatively affect system performance, degrading data availability and precision.
expr: |
  sum without(result) (increase(influxql_service_executing_duration_seconds_sum[5m])/clamp_min(increase(influxql_service_requests_total[5m]), 1)) >= 0.10000000000000001
for: 5m
labels:
  severity: warning

Dashboards

Following dashboards are generated from mixins and hosted on github: