Friday, February 2, 2018

OpenSwitch

OpenSwitch is a Linux Foundation project providing an open source white box control plane running on a standard Linux distribution. The diagram above shows the OpenSwitch architecture.

This article describes how to enable industry standard sFlow telemetry using the open source Host sFlow agent. The Host sFlow agent uses Control Plane Services (CPS) to configure sFlow instrumentation in the hardware and gather metrics. CPS in turn uses the Open Compute Project (OCP) Switch Abstraction Interface (SAI) as a vendor independent method of configuring the hardware. Hardware support for sFlow is a standard feature supported by Network Processing Unit (NPU) vendors (Barefoot, Broadcom, Cavium, Innovium, Intel, Marvell, Mellanox, etc.) and vendor neutral sFlow configuration is part of the SAI.

Installing and configuring Host sFlow agent

Installing the software is simple. Log into the switch and type the following commands:
wget --no-check-certificate https://github.com/sflow/host-sflow/releases/download/v2.0.13-3/hsflowd-opx_2.0.13-3_amd64.deb
sudo dpkg -i hsflowd-opx_2.0.13-3_amd64.deb
The sFlow agent requires very little configuration, automatically monitoring all switch ports using the following default settings:

Link SpeedSampling RatePolling Interval
1 Gbit/s1-in-1,00030 seconds
10 Gbit/s1-in-10,00030 seconds
25 Gbit/s1-in-25,00030 seconds
40 Gbit/s1-in-40,00030 seconds
50 Gbit/s1-in-50,00030 seconds
100 Gbit/s1-in-100,00030 seconds

Note: The default settings ensure that large flows (defined as consuming 10% of link bandwidth) are detected within approximately 1 second - see Large flow detection

Edit the /etc/hsflowd.conf file to specify the address of an sFlow analyzer (10.0.0.1):
sflow {
  collector { ip = 10.0.0.1 }
}
Monitoring Linux services describes how configure Host sFlow to include detailed telemetry for all services running on OpenSwitch:
  • bind9.service
  • cron.service
  • dbus.service
  • getty@tty1.service
  • getty@tty2.service
  • getty@tty3.service
  • getty@tty4.service
  • getty@tty5.service
  • getty@tty6.service
  • hsflowd.service
  • lldpd.service
  • networking.service
  • opx-alms.service
  • opx-cps.service
  • opx-front-panel-ports.service
  • opx-ip.service
  • opx-monitor-phy-media.service
  • opx-nas-shell.service
  • opx-nas.service
  • opx-nbmgr.service
  • opx-phy-media-config.service
  • opx-tmpctl.service
  • polkitd.service
  • redis-server.service
  • rsyslog.service
  • snmpd.service
  • ssh.service
  • systemd-journald.service
  • systemd-logind.service
  • systemd-udevd.service

Finally, start the Host sFlow agent:
sudo systemctl enable hsflowd
sudo systemctl start hsflowd
Using the Host sFlow agent to monitor Linux servers and switches provides a consistent set of measurements end-to-end, particularly for cloud infrastructure such as OpenStack and Docker where the network extends into the servers in the form of virtual switches and routers.

Collecting and analyzing sFlow

Visibility and the software defined data center describes the general architecture of sFlow monitoring. Standard sFlow agents embedded within the elements of the infrastructure stream essential performance metrics to management tools, ensuring that every resource in a dynamic cloud infrastructure is immediately detected and continuously monitored.

The Host sFlow agent on OpenSwitch streams standard Linux performance statistics in addition to the interface counters and packet samples that you would typically get from a networking device.
Note: Enhanced visibility into host performance is particularly important on open switch platforms since they may be running a number of user installed services that can stress the limited CPU, memory and IO resources.
For example, the following sflowtool output shows the raw data contained in an sFlow datagram:
startDatagram =================================
datagramSourceIP 10.0.0.59
datagramSize 1332
unixSecondsUTC 1516946395
datagramVersion 5
agentSubId 100000
agent 10.0.0.59
packetSequenceNo 340132
sysUpTime 17479000
samplesInPacket 7
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 876
sourceId 2:1
counterBlock_tag 0:2001
counterBlock_tag 0:2005
disk_total 8102721536
disk_free 5178248192
disk_partition_max_used 37.77
disk_reads 25339
disk_bytes_read 562041856
disk_read_time 25380
disk_writes 3192551
disk_bytes_written 28776890368
disk_write_time 1043712
counterBlock_tag 0:2004
mem_total 2107891712
mem_free 142082048
mem_shared 0
mem_buffers 155873280
mem_cached 1611935744
swap_total 0
swap_free 0
page_in 184268
page_out 9367478
swap_in 0
swap_out 0
counterBlock_tag 0:2003
cpu_load_one 0.010
cpu_load_five 0.030
cpu_load_fifteen 0.000
cpu_proc_run 2
cpu_proc_total 167
cpu_num 2
cpu_speed 2699
cpu_uptime 3541814
cpu_user 3336490
cpu_nice 0
cpu_system 5479320
cpu_idle 2754958964
cpu_wio 168960
cpuintr 160
cpu_sintr 2717250
cpuinterrupts 656232310
cpu_contexts 1704273704
cpu_steal 0
cpu_guest 0
cpu_guest_nice 0
counterBlock_tag 0:2006
nio_bytes_in 267777
nio_pkts_in 4210
nio_errs_in 0
nio_drops_in 0
nio_bytes_out 2104528
nio_pkts_out 2227
nio_errs_out 0
nio_drops_out 0
counterBlock_tag 0:2000
hostname opx2_vm
UUID 40-d4-8b-d5-6b-29-4e-4a-be-48-d6-55-8d-f6-81-73
machine_type 3
os_name 2
os_release 3.16.0-4-amd64
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:2
sampleType COUNTERSSAMPLE
sampleSequenceNo 876
sourceId 0:44
counterBlock_tag 0:1005
ifName e101-001-0
counterBlock_tag 0:1
ifIndex 3
networkType 6
ifSpeed 0
ifDirection 2
ifStatus 0
ifInOctets 0
ifInUcastPkts 0
ifInMulticastPkts 0
ifInBroadcastPkts 0
ifInDiscards 0
ifInErrors 0
ifInUnknownProtos 4294967295
ifOutOctets 0
ifOutUcastPkts 0
ifOutMulticastPkts 0
ifOutBroadcastPkts 0
ifOutDiscards 0
ifOutErrors 0
ifPromiscuousMode 0
endSample   ----------------------
startSample ----------------------
sampleType_tag 0:1
sampleType FLOWSAMPLE
sampleSequenceNo 1022129
sourceId 0:7
meanSkipCount 128
samplePool 130832512
dropEvents 0
inputPort 7
outputPort 10
flowBlock_tag 0:1
flowSampleType HEADER
headerProtocol 1
sampledPacketSize 1518
strippedBytes 4
headerLen 128
headerBytes 6C-64-1A-00-04-5E-E8-E7-32-77-E2-B5-08-00-45-00-05-DC-63-06-40-00-40-06-9E-21-0A-64-0A-97-0A-64-14-96-9A-6D-13-89-4A-0C-4A-42-EA-3C-14-B5-80-10-00-2E-AB-45-00-00-01-01-08-0A-5D-B2-EB-A5-15-ED-48-B7-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35-36-37-38-39-30-31-32-33-34-35
dstMAC 6c641a00045e
srcMAC e8e73277e2b5
IPSize 1500
ip.tot_len 1500
srcIP 10.100.10.151
dstIP 10.100.20.150
IPProtocol 6
IPTOS 0
IPTTL 64
TCPSrcPort 39533
TCPDstPort 5001
TCPFlags 16
endSample   ----------------------
Note: The Linux host metrics (red), network interface counters (green), and packet sample information (blue) have been highlighted.

sflowtool has a number of additional uses:
  • Verifying that sFlow is being received correctly at the destination
  • Converting binary sFlow data into ASCII for scripted analysis (Python, Perl etc.)
  • Converting sFlow into IPFIX/NetFlow
  • Converting sFlow into PCAP format for use with tcpdump, Wireshark, etc.
  • Replicate sFlow streams for multiple collectors
  • Source code for sFlow decoder that can be used to build custom sFlow analyzer
In addition to sflowtool, there are many other open source and commercial sFlow collectors listed on sFlow.org.

A key feature of sFlow telemetry is the low latency network-wide visibility that is possible because of the stateless nature of the measurements. Comprehensive real-time visibility is an essential building block that provides feedback for operations, automation, and control. Articles on this blog use the sFlow-RT analyzer to demonstrate use cases for real-time telemetry, including:
The programmability of an open Linux network operating system combined with real-time visibility is transformative, providing the foundation services necessary for solutions that automatically adapt the network to changing demands.

Friday, January 26, 2018

Intranet DDoS attacks

As on a Darkling Plain: Network Survival in an Age of Pervasive DDoS talk by Steinthor Bjarnason at the recent NANOG 71 conference. The talk discusses the threat that the proliferation of network connected devices in enterprises create when they are used to launch denial of service attacks. Last year's Mirai attacks are described, demonstrating the threat posed by mixed mode attacks where a compromised host is used to infect large numbers devices on the corporate network.
The first slide from the talk shows a denial attack launched against an external target, launched from infected video surveillance cameras scattered throughout the the enterprise network. The large volume of traffic fills up external WAN link and overwhelms stateful firewalls.
The second slide shows an attack targeting critical internal services that can have been identified by reconnaissance from the compromised devices. In addition, scanning activity associated with reconnaissance for additional devices can itself overload internal resources and cause outages.

In both cases, most of the critical activity occurs behind the corporate firewall, making it extremely challenging to detect and mitigate these threats.

The talk discusses a number of techniques that service providers use to secure their networks that enterprises will need to adopt in order to meet this challenge. In particular, "utilizing flow telemetry to analyze external and internal traffic. This is necessary for attack detection, classification and traceback."

Instrumentation needs to be built into every network device in order to provide the comprehensive visibility required to address these challenges. sFlow is a scaleable streaming telemetry solution built into a wide variety of devices, from low cost edge switches to high end chassis routers. Network vendors that support sFlow include: A10, Aerohive, AlexalA, ALUe, Allied Telesis, Arista, Aruba, Big Switch, Brocade, Cisco, Cumulus, DCN, Dell, D-Link, Edge-Core, Enterasys, Extreme, F5, Fortinet, HPE, Hitachi, Huawei, IBM, IP Infusion, Juniper, NEC, Netgear, OpenSwitch, Open vSwitch, Oracle, Pica8, Plexxi, Pluribus, Proxim, Quanta, Silicom, SMC, ZTE, and ZyXEL.

Selecting devices that support sFlow simplifies operations by ensuring that the visibility needed to effectively manage the network is integrated into the fabric and deployed pervasively. Attempting to add visibility later is complex, expensive, and results in limited coverage.

There are a number of examples of DDoS mitigation using sFlow on this blog. While many of the examples focus on external DDoS attacks, the techniques are equally applicable to the internal network.