gitlab


Overview

Jsonnet source code is available at github.com/grafana/jsonnet-libs

Alerts

Complete list of pregenerated alerts is available here.

GitLabAlerts

GitLabHighJobRegistrationFailures

alert: GitLabHighJobRegistrationFailures
annotations:
  description: '{{ printf "%.2f" $value }}% of job registrations have failed on {{$labels.instance}},
    which is above threshold of 10%.'
  summary: Large percentage of failed attempts to register a job.
expr: "100 * rate(job_register_attempts_failed_total{}[5m]) / rate(job_register_attempts_total{}[5m])
  
> 10
"
for: 5m
labels:
  severity: warning

GitLabHighRunnerAuthFailure

alert: GitLabHighRunnerAuthFailure
annotations:
  description: '{{ printf "%.2f" $value }}% of GitLab runner authentication attempts
    are failing on {{$labels.instance}}, which is above the threshold of 10%.'
  summary: Large percentage of runner authentication failures.
expr: "100 * sum by (instance) (rate(gitlab_ci_runner_authentication_failure_total{}[5m]))
  \ / 
(sum by (instance) (rate(gitlab_ci_runner_authentication_success_total{}[5m]))
  \ + sum by (instance) (rate(gitlab_ci_runner_authentication_failure_total{}[5m])))
>
  10
"
for: 5m
labels:
  severity: warning

GitLabHigh5xxResponses

alert: GitLabHigh5xxResponses
annotations:
  description: '{{ printf "%.2f" $value }}% of all requests returned 5XX HTTP responses,
    which is above the threshold 10%, indicating a system issue on {{$labels.instance}}.'
  summary: Large rate of HTTP 5XX errors.
expr: "100 * sum by (instance) (rate(http_requests_total{status=~\"^5.*\"}[5m])) /
  sum by (instance) (rate(http_requests_total{}[5m])) 
> 10
"
for: 5m
labels:
  severity: critical

GitLabHigh4xxResponses

alert: GitLabHigh4xxResponses
annotations:
  description: '{{ printf "%.2f" $value }}% of all requests returned 4XX HTTP responses,
    which is above the threshold 10%, indicating many failed requests on {{$labels.instance}}.'
  summary: Large rate of HTTP 4XX errors.
expr: |
  100 * sum by (instance) (rate(http_requests_total{status=~"^4.*"}[5m])) / sum by (instance) (rate(http_requests_total{}[5m]))
  > 10
for: 5m
labels:
  severity: warning

Dashboards

Following dashboards are generated from mixins and hosted on github: