Monday, August 19, 2024

Dropped packet metrics with Prometheus and Grafana

Dropped packets due to black hole routes, buffer exhaustion, expired TTLs, MTU mismatches, etc. can result in insidious connection failures that are time consuming and difficult to diagnose. Dropped packet notifications with Arista Networks, VyOS dropped packet notifications and Using sFlow to monitor dropped packets describe implementations of the sFlow Dropped Packet Notification Structures extension for Arista Networks switches, VyOS routers, and Linux servers respectively, providing end to end visibility into packet drop events (including switch port, drop reason and packet header for each dropped packet).

Flow metrics with Prometheus and Grafana describes how define flow metrics and create dashboards to trend the flow metrics over time. This article describes how the same setup can be used to define and trend metrics based on dropped packet notifications.

  - job_name: sflow-rt-drops
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['sflow-rt:8008']
    params:
      metric: ['dropped_packets']
      key:
        - 'node:inputifindex'
        - 'ifname:inputifindex'
        - 'reason'
        - 'stack'
        - 'macsource'
        - 'macdestination'
        - 'null:vlan:untagged'
        - 'null:[or:ipsource:ip6source]:none'
        - 'null:[or:ipdestination:ip6destination]:none'
        - 'null:[or:icmptype:icmp6type:ipprotocol:ip6nexthdr]:none'
      label:
        - 'switch'
        - 'port'
        - 'reason'
        - 'stack'
        - 'macsource'
        - 'macdestination'
        - 'vlan'
        - 'src'
        - 'dst'
        - 'protocol'
      value: ['frames']
      dropped: ['true']
      maxFlows: ['20']
      minValue: ['0.001']

The Prometheus scrape configuration above is used to keep track of drop notifications. The highlighed dropped setting is used to select drop notifications for the metric (the default dropped:['false'] creates flow metrics based packet samples and is used to trend normal traffic).

Deploy real-time network dashboards using Docker compose is the simplest way to deploy an sFlow-RT, Prometheus, and Grafana stack with some basic dashboards. Install sFlow-RT Dropped Packets dashboard, code 21721, in Grafana to see the dashboard shown at the top of this page, displaying Drop Locations, Drop Reasons and Dropped Packet Details.