Tuesday, September 10, 2024

Emulating congestion with Containerlab

The Containerlab dashboard above shows variation in throughput in a leaf and spine network due to large "Elephant" flow collisions in an emulated network, see Leaf and spine traffic engineering using segment routing and SDN for a demonstration of the issue using physical switches.

This article describes the steps needed to emulate realistic network performance problems using Containerlab. First, using the FRRouting (FRR) open source router to build the topology provides a lightweight, high performance, routing implementation that can be used to efficiently emulate large numbers of routers using the native Linux dataplane for packet forwarding. Second, the containerlab tools netem set command can be used to introduce packet loss, delay, jitter, or restrict bandwidth of ports.

The netem tool makes use of the Linux tc (traffic control) module. Unfortunately, if you are using Docker desktop, the minimal virtual machine used to run containers does not include the tc module.

multipass launch docker
Instead, use Multipass as a convenient way to create and start an Ubuntu virtual machine with Docker support on your laptop. If you are already on a Linux system with Docker installed, skip forward to the git clone step.
multipass ls
List the multipass virtual machines.
Name                    State             IPv4             Image
docker                  Running           192.168.65.3     Ubuntu 22.04 LTS
                                          172.17.0.1
Make a note of the IP address(es) of the docker virtual machine.
multipass shell docker
Run a shell inside the docker virtual machine.
git clone https://github.com/sflow-rt/containerlab.git
Install sflow-rt/containerlab project.
cd containerlab
./run-clab

Run Containerlab using Docker.

In this example we will be using the 3 Stage Close Topology shown above.
env SAMPLING=10 containerlab deploy -t clos3.yml
Start a leaf and spine topology emulation, but use a sampling rate of 1-in-10 rather than the default of 1-in-1000. See Large flow detection for a discussion of scaling sampling rates with link speed to get consistent results between the emulation and a physical network.
./bw.py clab-clos3
Rate limit the links in the topology to 10Mbps.
./topo.py clab-clos3
Post the topology to the sFlow-RT real-time analytics container. Access the Containerlab Dashboard shown at the top of this page using a web browser to connect to http://192.168.65.3:8008/ (where 192.168.65.3 is the IP address of the docker container noted earlier).
docker exec -it clab-clos3-h1 iperf3 -c 172.16.2.2 --parallel 2
Run a series of iperf3 tests to create pairs of large flows between h1 and h2. When the flows take different paths across the fabric the total available bandwidth is 20mbps. If the flows hash onto the same path, then they share 10mbps bandwidth and the throughput is halved.

This example demonstrates that Containerlab is not restricted to emulating and validating configurations, but can also be used to emulate performance issues. In this example, the effects of large flow collisions are relevant to the performance of data center fabrics handling AI/ML workloads where large flow collisions can significantly limit performance, see RoCE networks for distributed AI training at scale.