Monday, August 19, 2024

Dropped packet metrics with Prometheus and Grafana

Dropped packets due to black hole routes, buffer exhaustion, expired TTLs, MTU mismatches, etc. can result in insidious connection failures that are time consuming and difficult to diagnose. Dropped packet notifications with Arista Networks, VyOS dropped packet notifications and Using sFlow to monitor dropped packets describe implementations of the sFlow Dropped Packet Notification Structures extension for Arista Networks switches, VyOS routers, and Linux servers respectively, providing end to end visibility into packet drop events (including switch port, drop reason and packet header for each dropped packet).

Flow metrics with Prometheus and Grafana describes how define flow metrics and create dashboards to trend the flow metrics over time. This article describes how the same setup can be used to define and trend metrics based on dropped packet notifications.

  - job_name: sflow-rt-drops
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['sflow-rt:8008']
    params:
      metric: ['dropped_packets']
      key:
        - 'node:inputifindex'
        - 'ifname:inputifindex'
        - 'reason'
        - 'stack'
        - 'macsource'
        - 'macdestination'
        - 'null:vlan:untagged'
        - 'null:[or:ipsource:ip6source]:none'
        - 'null:[or:ipdestination:ip6destination]:none'
        - 'null:[or:icmptype:icmp6type:ipprotocol:ip6nexthdr]:none'
      label:
        - 'switch'
        - 'port'
        - 'reason'
        - 'stack'
        - 'macsource'
        - 'macdestination'
        - 'vlan'
        - 'src'
        - 'dst'
        - 'protocol'
      value: ['frames']
      dropped: ['true']
      maxFlows: ['20']
      minValue: ['0.001']

The Prometheus scrape configuration above is used to keep track of drop notifications. The highlighed dropped setting is used to select drop notifications for the metric (the default dropped:['false'] creates flow metrics based packet samples and is used to trend normal traffic).

Deploy real-time network dashboards using Docker compose is the simplest way to deploy an sFlow-RT, Prometheus, and Grafana stack with some basic dashboards. Install sFlow-RT Dropped Packets dashboard, code 21721, in Grafana to see the dashboard shown at the top of this page, displaying Drop Locations, Drop Reasons and Dropped Packet Details.

5 comments:

  1. Hello, I enjoyed reading your blog. Thanks to the detailed explanation, I was able to verify most of the features mentioned in the blog. However, only the dropped packet metric is not being displayed in Grafana. When I tried this on my labtop with Arista EOS, the dropped packets are visible in sFlow-RT, but it seems that the data is not being exported properly to Prometheus. Is there anything specific I should check?

    ReplyDelete
    Replies
    1. The first thing I would check is that dropped packets show up in the Discard Browser application (included with the Prometheus docker image). Start with 'stack' as the key (to display the protocol stack) and fps as the value.

      Next, check Prometheus scrape task is succeeding, go to http://localhost:9090 and check the Status>Targets page. Next, check the dropped_packets metric shows up in the Prometheus Graph tab.

      Finally, make sure that minValue in your scrape task is set to 0.001 or smaller (as in the example above). A single drop event results in a small value in terms of packets per second, so the default (intended for traffic flows) would likely clip the discards.

      Delete
    2. Thank you for taking the time to respond to my question. I have checked the points you mentioned, but I am still unsure whether the issue lies with the behavior of the export.js script in sFlow-RT or with the Prometheus configuration (possibly the metric path). I have also initiated a conversation in the Google Group for more details.

      I have shared a few screenshots to help illustrate the situation, and I would greatly appreciate any further assistance you can provide.

      Delete
    3. Thank you for the detailed information on the sFlow-RT mailing list! I think I see what is going on, the 'node:inputifindex' and the 'ifname:inputifindex' keys require that SNMP is enabled for the sFlow-RT container (-Dsnmp.ifname=yes), or a topology is installed, https://sflow-rt.com/topology.php. The easiest fix for now would be to change 'node:inputifindex' to 'agent' and 'ifname:inputifindex' to 'inputifindex' in your Prometheus scrape script (and restart Prometheus container).

      Delete
    4. Thanks for your answer! Finally, I can get a dropped packet informations on grafana by your help!

      Delete