Friday, July 31, 2020

Using sFlow to monitor dropped packets

Visibility into dropped packets describes instrumentation, recently added to the Linux kernel, that provides visibility into packets dropped by the kernel data path on a host, or dropped by a switch ASIC when packets are forwarded in hardware. This article describes integration of drop monitoring in the open source Host sFlow agent and inclusion of drop reporting as part of industry standard sFlow telemetry.

Extending sFlow to provide visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network-wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.

Packet discard monitoring complements sFlow's existing counter polling and packet sampling mechanisms and shares a common data model so that all three sources of data can be correlated.  For example, if packets are being discarded because of buffer exhaustion, the discard records don't necessarily tell the whole story. The discarded packets may represent mice flows that are victims of an elephant flow. Packet samples will reveal the traffic that isn't being dropped and provide a more complete picture. Counter data adds additional information such as CPU load, interface speed, link utilization, packet and discard rates that further completes the picture.

The following steps build the Host sFlow agent with drop monitoring on an Ubuntu 20 system (a Linux 5.4 kernel or newer is required):
git clone https://github.com/sflow/host-sflow
cd host-sflow
make FEATURES="HOST PCAP DROPMON"
sudo make install
sudo make schedule
Next, edit the /etc/hsflowd.conf file, in this example, directing sFlow to be sent to a collector at 192.168.1.242, enabling packet sampling on host adapter enp0s3, and enabling drop monitoring:
sflow {
  collector { ip=192.168.1.242 }
  pcap { dev = enp0s3 }
  dropmon { group = 1 start = on }
}
Start the agent:
sudu systemctl enable hsflowd
sudo systemctl start hsflowd
Build the latest version of sflowtool on the collector host (192.168.1.242):
git clone https://github.com/sflow/sflowtool
cd sflowtool
./boot.sh
./configure
make
sudo make install
Now run sflowtool to receive and decode the sFlow telemetry stream:
sflowtool
The following example shows the output for a discarded TCP packet:
startSample ----------------------
sampleType_tag 0:5
sampleType DISCARD
sampleSequenceNo 20
sourceId 0:1
dropEvents 0
inputPort 1
outputPort 0
discardCode 289
discardReason unknown_l4
discarded_flowBlock_tag 0:1
discarded_flowSampleType HEADER
discarded_headerProtocol 1
discarded_sampledPacketSize 54
discarded_strippedBytes 0
discarded_headerLen 54
discarded_headerBytes 00-00-00-00-00-00-00-00-00-00-00-00-08-00-45-00-00-28-00-00-40-00-40-06-3C-CE-7F-00-00-01-7F-00-00-01-04-05-04-39-00-00-00-00-14-51-E2-8A-50-14-00-00-B2-B4-00-00
discarded_dstMAC 000000000000
discarded_srcMAC 000000000000
discarded_IPSize 40
discarded_ip.tot_len 40
discarded_srcIP 127.0.0.1
discarded_dstIP 127.0.0.1
discarded_IPProtocol 6
discarded_IPTOS 0
discarded_IPTTL 64
discarded_IPID 0
discarded_TCPSrcPort 1029
discarded_TCPDstPort 1081
discarded_TCPFlags 20
endSample   ----------------------
The sflowtool -T option converts the discarded packet records into PCAP format so that they can be decoded by packet analysis tools such as Wireshark and tcpdump:
sflowtool -T | tshark -r -
   12  22.000000 192.168.1.242 → 192.168.1.87 TCP 78 65527 → 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1324841769 TSecr=0 SACK_PERM=1
The article sFlow to JSON uses Python examples to demonstrate how sflowtool's ability to convert sFlow records into JSON can be used for further analysis.

Thursday, July 16, 2020

Visibility into dropped packets

Dropped packets have a profound impact on network performance and availability. Packet discards due to congestion can significantly impact application performance. Dropped packets due to black hole routes, expired TTLs, MTU mismatches, etc. can result in insidious connection failures that are time consuming and difficult to diagnose.

Devlink Trap describes recent changes to the Linux drop monitor service that provide visibility into packets dropped by switch ASIC hardware. When a packet is dropped by the ASIC, an event is generated that includes the header of the dropped packet and the reason why it was dropped. A hardware policer is used to limit the number of events generated by the ASIC to a rate that can be handled by the Linux kernel. The events are delivered to userspace applications using the Linux netlink service.

Running the dropwatch command line tool on an Ubuntu 20 system demonstrates the instrumentation:
pp@ubuntu20:~$ sudo dropwatch
Initializing null lookup method
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: __udp4_lib_rcv+0xae5/0xbb0 (0xffffffffb05ead95)
origin: software
input port ifindex: 2
timestamp: Wed Jul 15 23:57:36 2020 223253465 nsec
protocol: 0x800
length: 128
original length: 243

drop at: __netif_receive_skb_core+0x14f/0xc90 (0xffffffffb05351af)
origin: software
input port ifindex: 2
timestamp: Wed Jul 15 23:57:36 2020 530212717 nsec
protocol: 0x4
length: 52
original length: 52
The dropwatch package also includes the dwdump command, allowing the dropped packets to be analyzed using packet decoding tools such as tshark:
pp@ubuntu20:~$ sudo dwdump | tshark -r -
   12   0.777386 BelkinIn_a8:8a:7d → Spanning-tree-(for-bridges)_00 STP 196 Conf. Root = 61440/4095/24:f5:a2:a8:8a:7b  Cost = 0  Port = 0x8003
   13   2.825301 BelkinIn_a8:8a:7d → Spanning-tree-(for-bridges)_00 STP 196 Conf. Root = 61440/4095/24:f5:a2:a8:8a:7b  Cost = 0  Port = 0x8003
Linux is the base for many open source and commercial network operating systems used today (e.g. SONiC, DENT, Cumulus, EOS, NX-OS, and IOS-XR). A standard interface for monitoring packet drops is an exciting development, allowing Linux based network monitoring agents to collect, export, and analyze drop events to rapidly identify the root cause of dropped packets.