coredns
Overview
CoreDNS mixin provides Grafana dashboard and Prometheus Alerts to monitor CoreDNS. The mixin was introduced in Kubernetes Node Local DNS Cache blogpost to better help users monitor CoreDNS in Kubernetes. Mixin can also be used to monitor standalone CoreDNS instance without any orchestrators.
Alerts
coredns
CoreDNSDown
https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsdown
alert: CoreDNSDown
annotations:
message: CoreDNS has disappeared from Prometheus target discovery.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsdown
expr: |
absent(up{job="kube-dns"} == 1)
for: 15m
labels:
severity: critical
CoreDNSLatencyHigh
https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednslatencyhigh
alert: CoreDNSLatencyHigh
annotations:
message: CoreDNS has 99th percentile latency of {{ $value }} seconds for server
{{ $labels.server }} zone {{ $labels.zone }} .
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednslatencyhigh
expr: |
histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket{job="kube-dns"}[5m])) by(server, zone, le)) > 4
for: 10m
labels:
severity: critical
CoreDNSErrorsHigh
https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednserrorshigh
alert: CoreDNSErrorsHigh
annotations:
message: CoreDNS is returning SERVFAIL for {{ $value | humanizePercentage }} of
requests.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednserrorshigh
expr: |
sum(rate(coredns_dns_responses_total{job="kube-dns",rcode="SERVFAIL"}[5m]))
/
sum(rate(coredns_dns_responses_total{job="kube-dns"}[5m])) > 0.03
for: 10m
labels:
severity: critical
CoreDNSErrorsHigh
https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednserrorshigh
alert: CoreDNSErrorsHigh
annotations:
message: CoreDNS is returning SERVFAIL for {{ $value | humanizePercentage }} of
requests.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednserrorshigh
expr: |
sum(rate(coredns_dns_responses_total{job="kube-dns",rcode="SERVFAIL"}[5m]))
/
sum(rate(coredns_dns_responses_total{job="kube-dns"}[5m])) > 0.01
for: 10m
labels:
severity: warning
coredns_forward
CoreDNSForwardLatencyHigh
alert: CoreDNSForwardLatencyHigh
annotations:
message: CoreDNS has 99th percentile latency of {{ $value }} seconds forwarding
requests to {{ $labels.to }}.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwardlatencyhigh
expr: |
histogram_quantile(0.99, sum(rate(coredns_forward_request_duration_seconds_bucket{job="kube-dns"}[5m])) by(to, le)) > 4
for: 10m
labels:
severity: critical
CoreDNSForwardErrorsHigh
https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwarderrorshigh
alert: CoreDNSForwardErrorsHigh
annotations:
message: CoreDNS is returning SERVFAIL for {{ $value | humanizePercentage }} of
forward requests to {{ $labels.to }}.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwarderrorshigh
expr: |
sum(rate(coredns_forward_responses_total{job="kube-dns",rcode="SERVFAIL"}[5m]))
/
sum(rate(coredns_forward_responses_total{job="kube-dns"}[5m])) > 0.03
for: 10m
labels:
severity: critical
CoreDNSForwardErrorsHigh
https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwarderrorshigh
alert: CoreDNSForwardErrorsHigh
annotations:
message: CoreDNS is returning SERVFAIL for {{ $value | humanizePercentage }} of
forward requests to {{ $labels.to }}.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwarderrorshigh
expr: |
sum(rate(coredns_forward_responses_total{job="kube-dns",rcode="SERVFAIL"}[5m]))
/
sum(rate(coredns_forward_responses_total{job="kube-dns"}[5m])) > 0.01
for: 10m
labels:
severity: warning
CoreDNSForwardHealthcheckFailureCount
alert: CoreDNSForwardHealthcheckFailureCount
annotations:
message: CoreDNS health checks have failed to upstream server {{ $labels.to }}.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwardhealthcheckfailurecount
expr: |
sum(rate(coredns_forward_healthcheck_failures_total{job="kube-dns"}[5m])) by (to) > 0
for: 10m
labels:
severity: warning
CoreDNSForwardHealthcheckBrokenCount
alert: CoreDNSForwardHealthcheckBrokenCount
annotations:
message: CoreDNS health checks have failed for all upstream servers.
runbook_url: https://github.com/povilasv/coredns-mixin/tree/master/runbook.md#alert-name-corednsforwardhealthcheckbrokencount
expr: |
sum(rate(coredns_forward_healthcheck_broken_total{job="kube-dns"}[5m])) > 0
for: 10m
labels:
severity: warning
Dashboards
Following dashboards are generated from mixins and hosted on github: