apache-hbase

Overview

Jsonnet source code is available at github.com/grafana/jsonnet-libs

Alerts

Complete list of pregenerated alerts is available here.

apache-hbase-alerts

HBaseHighHeapMemUsage

alert: HBaseHighHeapMemUsage
annotations:
  description: The heap memory usage for the JVM on instance {{$labels.instance}}
    in cluster {{$labels.hbase_cluster}} is {{printf "%.0f" $value}} percent, which
    is above the threshold of 80 percent
  summary: There is a limited amount of heap memory available to the JVM.
expr: |
  100 * sum without(context, hostname, processname) (jvm_metrics_mem_heap_used_m{job=~"integrations/apache-hbase"} / clamp_min(jvm_metrics_mem_heap_committed_m{job=~"integrations/apache-hbase"}, 1))  > 80
for: 5m
labels:
  severity: warning

HBaseDeadRegionServer

alert: HBaseDeadRegionServer
annotations:
  description: '{{$value}} RegionServer(s) in cluster {{$labels.hbase_cluster}} are
    unresponsive, which is above the threshold of 0. The name(s) of the dead RegionServer(s)
    are {{$labels.deadregionservers}}'
  summary: One or more RegionServer(s) has become unresponsive.
expr: |
  server_num_dead_region_servers > 0
for: 5m
labels:
  severity: warning

HBaseOldRegionsInTransition

alert: HBaseOldRegionsInTransition
annotations:
  description: '{{printf "%.0f" $value}} percent of RegionServers in transition in
    cluster {{$labels.hbase_cluster}} are transitioning for longer than expected,
    which is above the threshold of 50 percent'
  summary: RegionServers are in transition for longer than expected.
expr: |
  100 * assignment_manager_rit_count_over_threshold / clamp_min(assignment_manager_rit_count, 1) > 50
for: 5m
labels:
  severity: warning

HBaseHighMasterAuthFailRate

alert: HBaseHighMasterAuthFailRate
annotations:
  description: '{{printf "%.0f" $value}} percent of authentication attempts to the
    master are failing in cluster {{$labels.hbase_cluster}}, which is above the threshold
    of 35 percent'
  summary: A high percentage of authentication attempts to the master are failing.
expr: |
  100 * rate(master_authentication_failures[5m]) / (clamp_min(rate(master_authentication_successes[5m]), 1) + clamp_min(rate(master_authentication_failures[5m]), 1)) > 35
for: 5m
labels:
  severity: warning

HBaseHighRSAuthFailRate

alert: HBaseHighRSAuthFailRate
annotations:
  description: '{{printf "%.0f" $value}} percent of authentication attempts to the
    RegionServer {{$labels.instance}} are failing in cluster {{$labels.hbase_cluster}},
    which is above the threshold of 35 percent'
  summary: A high percentage of authentication attempts to a RegionServer are failing.
expr: |
  100 * rate(region_server_authentication_failures[5m]) / (clamp_min(rate(region_server_authentication_successes[5m]), 1) + clamp_min(rate(region_server_authentication_failures[5m]), 1)) > 35
for: 5m
labels:
  severity: warning

Dashboards

Following dashboards are generated from mixins and hosted on github: