Thursday, March 14, 2013

DDoS

A new class of damaging DDoS attacks was launched against U.S. banks in the second half of 2012, sometimes adding up to 70 Gbps of noisy network traffic blasting at the banks through their Internet pipes. Until this recent spate of attacks, most network-level DDoS attacks consumed only five Gbps of bandwidth, but more recent levels made it impossible for bank customers and others using the same pipes to get to their websites. - Gartner
Figure 1: Components of a DDoS attack (credit Wikipedia)
Figure 1 shows the components of a Distributed Denial of Service (DDoS) attack. The attacker uses a command and control network to instruct large numbers of compromised systems to send traffic to a designated target with the aim of overwhelming the target infrastructure and denying access to legitimate users.

This article will show how the standard sFlow monitoring built in to most vendor's network equipment can be used to rapidly detect DDoS attacks and drive automated controls to mitigate their effect. This case study is based on a data center network consisting of approximately 500 switches and 30,000 switch ports and the charts show production traffic. This network was used as a testbed for developing the sFlow-RT analytics engine and the resulting solution is now used in production.
Figure 2: Uncontrolled DDoS attack
Figure 1 shows a typical DDoS attack, consisting of sustained traffic levels of over 5M packets per second (30 Gigabits per second) that last for many hours. The attacks are intended to saturate the links to the data center and deny access to the servers hosted there.

Note: This chart is from an early sFlow-RT prototype and the drop outs are spurious.

Figure 3: Performance aware software defined networking
Performance aware software defined networking describes the basic elements of the DDoS mitigation system. The sFlow measurements from all the switches are sent to the sFlow-RT analytics engine which provides real-time notification of denial of service attacks and information about the attackers and targets to the DDoS protection application (a variant of Python script shown in the article). The DDoS protection application issues commands to the controller which communicates with the switches to eliminate the DDoS traffic. In this specific example, the controller doesn't actually use OpenFlow to communicate with the switches - instead scripts automatically login to the switch CLI to issue configuration commands that cause upstream routers to drop the traffic (see null route).
Figure 4: Five DDoS attacks within three minutes
Figure 4 shows results from an early prototype controller. The chart is interesting because it shows five separate DDoS attacks occurring within the span of three minutes. Each attack is being stopped in under 30 seconds - this is fast enough that the attacks don't fully evolve, peaking at 3 million packets per second, rather than the typical 5+ million packets per second.

Note: It takes the attacker some time to fully mobilize their network of compromised hosts - if the defense actions can be deployed faster than the attacker can deploy their resources then the effect of the attack is largely eliminated.
Figure 5: Elements of controller delay
Figure 5, from SDN and delay, describes the components of response time in the control loop. Further tuning to reduce the measurement delay and configuration delay significantly improved effectiveness of the controller.
Figure 6: Mitigating DDoS attack using fast controller
Figure 6 shows the performance of the improved controller: the response time to detect an attack and implement a control is around 4 seconds, the peak traffic cut by two thirds, and all the traffic is eliminated in approximately 10 seconds.

This denial of service mitigation example demonstrates sFlow's unique suitability for control applications. More broadly, sFlow provides the comprehensive measurements needed to drive a variety of resource allocation and load balancing applications, including: SDN and large flows,  ECMP load balancingLoad balancing LAG/ECMP groups, and cloud orchestration.

In future, expect to see sFlow-based performance awareness incorporated in a wide range of orchestration platforms, leveraging existing infrastructure to increase performance, reduce costs and ensuring quality of service - ask vendors about their plans.

No comments:

Post a Comment