tensorflow

Overview

Jsonnet source code is available at github.com/grafana/jsonnet-libs

Alerts

Complete list of pregenerated alerts is available here.

TensorFlowServingAlerts

TensorFlowModelRequestHighErrorRate


alert: TensorFlowModelRequestHighErrorRate
annotations:
  description: '{{ printf "%.2f" $value }}% of all model requests are not successful,
    which is above the threshold 30%, indicating a potentially larger issue for {{$labels.instance}}'
  summary: More than 30% of all model requests are not successful.
expr: |
  100 * sum(rate(:tensorflow:serving:request_count{status!="OK"}[5m])) by (instance) / sum(rate(:tensorflow:serving:request_count[5m])) by (instance) > 30
for: 5m
labels:
  severity: critical

TensorFlowServingHighBatchQueuingLatency


alert: TensorFlowServingHighBatchQueuingLatency
annotations:
  description: Batch queuing latency greater than {{ printf "%.2f" $value }}µs, which
    is above the threshold 5000000µs, indicating a potentially larger issue for {{$labels.instance}}
  summary: Batch queuing latency more than 5000000µs.
expr: |
  increase(:tensorflow:serving:batching_session:queuing_latency_sum[2m]) / increase(:tensorflow:serving:batching_session:queuing_latency_count[2m]) > 5000000
for: 5m
labels:
  severity: warning

Dashboards

Following dashboards are generated from mixins and hosted on github:

tensorflow-overview