Linux Port Mirroring: Unterschied zwischen den Versionen
Keine Bearbeitungszusammenfassung |
Keine Bearbeitungszusammenfassung |
||
Zeile 1: | Zeile 1: | ||
= Linux Port Mirroring = | = Linux Port Mirroring = | ||
Sometimes a span/mirror port isn’t available on a switch when you need one. However, if you just need to monitor traffic from/to a Linux server, you can have the Linux server mirror it’s own traffic toward a collector. You just need a spare NIC on the Linux system. | |||
Sometimes a span/mirror port isn’t available on a switch when | |||
For example, if you want to mirror all traffic going in/out of eth0, and send it out eth1 toward a collector, you can do that with the ‘tc’ subsystem. Some examples of this exist online. However, what they don’t tell you is that if eth1 goes down (unplugged, etc), that results in eth0 blocking all traffic — not good! I came up with a workaround for that problem using bridge and dummy interfaces. It’s kludgy, but it works! | For example, if you want to mirror all traffic going in/out of eth0, and send it out eth1 toward a collector, you can do that with the ‘tc’ subsystem. Some examples of this exist online. However, what they don’t tell you is that if eth1 goes down (unplugged, etc), that results in eth0 blocking all traffic — not good! I came up with a workaround for that problem using bridge and dummy interfaces. It’s kludgy, but it works! | ||
See below for all the required config files / scripts. (don’t forget to chmod the scripts so root can run them!) | See below for all the required config files / scripts. (don’t forget to chmod the scripts so root can run them!) | ||
/etc/network/interfaces | /etc/network/interfaces | ||
# The loopback network interface | |||
auto lo | |||
iface lo inet loopback | |||
# The primary network interface | |||
auto eth0 | |||
iface eth0 inet static | |||
address 192.168.1.123 | |||
netmask 255.255.255.20 | |||
gateway 192.168.1.254 | |||
dns-nameservers 8.8.8.8 | |||
# physical mirror port toward collector | |||
auto eth1 | |||
iface eth1 inet static | |||
address 127.1.1.1 | |||
netmask 255.255.255.255 | |||
up ip addr del 127.1.1.1/32 dev eth1;: | |||
# always-up mirror port toward dev null | |||
auto dummy0 | |||
iface dummy0 inet static | |||
address 127.4.4.4 | |||
netmask 255.255.255.255 | |||
pre-up modprobe dummy;: | |||
up ip addr del 127.4.4.4/32 dev dummy0;: | |||
# bridge of eth1 and dummy0 | |||
# this is so that it always stays up, even if eth1 is down | |||
auto br0 | |||
iface br0 inet static | |||
bridge_ports eth1 dummy0 | |||
address 127.5.5.5 | |||
netmask 255.255.255.255 | |||
up ip addr del 127.5.5.5/32 dev br0;: | |||
post-up /etc/network/mirror-up.sh;: | |||
pre-down /etc/network/mirror-down.sh;: | |||
/etc/network/mirror-up.sh | /etc/network/mirror-up.sh | ||
#!/bin/sh | |||
# ALK 2014-10-23 | |||
# Send all 'source_if' traffic (ingress & egress) to collector box on 'dest_if' | |||
# Normally called by boot process or ifup. i.e. | |||
# --/etc/network/interfaces-- | |||
# iface eth0 inet static | |||
# address X.X.X.X | |||
# netmask Y.Y.Y.Y | |||
# post-up /etc/network/mirror-up.sh;: | |||
source_if=eth0 | |||
dest_if=br0 | |||
# enable the destination port | |||
ifconfig $dest_if up;: | |||
# mirror ingress traffic | |||
tc qdisc add dev $source_if ingress;: | |||
tc filter add dev $source_if parent ffff: \ | |||
protocol all \ | |||
u32 match u8 0 0 \ | |||
action mirred egress mirror dev $dest_if;: | |||
# mirror egress traffic | |||
tc qdisc add dev $source_if handle 1: root prio;: | |||
tc filter add dev $source_if parent 1: \ | |||
protocol all \ | |||
u32 match u8 0 0 \ | |||
action mirred egress mirror dev $dest_if;: | |||
/etc/network/mirror-down.sh | /etc/network/mirror-down.sh | ||
#!/bin/sh | |||
# ALK 2014-10-23 | |||
# De-provision mirroring config (See mirror-up.sh for provisioning) | |||
# Normally called by boot process or ifdown. i.e. | |||
# --/etc/network/interfaces-- | |||
# iface eth0 inet static | |||
# address X.X.X.X | |||
# netmask Y.Y.Y.Y | |||
# pre-down /etc/network/mirror-down.sh;: | |||
source_if=eth0 | |||
# de-provision ingress mirroring | |||
tc qdisc del dev $source_if ingress;: | |||
# de-provisoin egress mirroring | |||
tc qdisc del dev $source_if root;: | |||
[https://adamkuj.net/blog/category/networking-tools/ Networking Tools] and tagged [https://adamkuj.net/blog/tag/bash/ bash], [https://adamkuj.net/blog/tag/ifconfig/ ifconfig], [https://adamkuj.net/blog/tag/linux/ linux], [https://adamkuj.net/blog/tag/mirror/ mirror], [https://adamkuj.net/blog/tag/monitor/ monitor], [https://adamkuj.net/blog/tag/span/ span], [https://adamkuj.net/blog/tag/tc/ tc] on . | |||
[https://adamkuj.net/blog/category/networking-tools/ Networking Tools] and tagged [https://adamkuj.net/blog/tag/bash/ bash], [https://adamkuj.net/blog/tag/ifconfig/ ifconfig], [https://adamkuj.net/blog/tag/linux/ linux], [https://adamkuj.net/blog/tag/mirror/ mirror], [https://adamkuj.net/blog/tag/monitor/ monitor], [https://adamkuj.net/blog/tag/span/ span], [https://adamkuj.net/blog/tag/tc/ tc] on . | |||
= Traffic Mirroring: Theory and Practice (with tc and Tunneling) = | = Traffic Mirroring: Theory and Practice (with tc and Tunneling) = | ||
Traffic mirroring (also called port mirroring in some cases) is a technique to send a copy of the network traffic to a backend system. It is commonly used by network developers and administrators/operators to monitor network traffic or diagnose problems. Many systems rely on traffic mirroring function, such as intrusion detection system (IDS), passive probe, real user monitoring (RUM), traffic replay system, etc. [1] | Traffic mirroring (also called port mirroring in some cases) is a technique to send a copy of the network traffic to a backend system. It is commonly used by network developers and administrators/operators to monitor network traffic or diagnose problems. Many systems rely on traffic mirroring function, such as intrusion detection system (IDS), passive probe, real user monitoring (RUM), traffic replay system, etc. [1] | ||
Traditional traffic mirroring solutions are based on hardware switches, and are vendor-specific, which often found to be expensive, hard to use and difficult to evolve. With the raising of network virtualization techniques, software-based and vendor-agnostic solutions emerge, which are more flexible and easier to upgrade. | Traditional traffic mirroring solutions are based on hardware switches, and are vendor-specific, which often found to be expensive, hard to use and difficult to evolve. With the raising of network virtualization techniques, software-based and vendor-agnostic solutions emerge, which are more flexible and easier to upgrade. | ||
This post first discusses the technical requirements for traffic mirroring, then investigates a specific solution based on | This post first discusses the technical requirements for traffic mirroring, then investigates a specific solution based on tc and tunneling. To make this a hands-on guide, we will set up a container-based virtual environment, but it also fits the physical (servers and networks) environments. | ||
= 1 Theory = | = 1 Theory = | ||
== 1.1 Technical requirements == | == 1.1 Technical requirements == | ||
[[Image:Bild1.png|top]] | [[Image:Bild1.png|top]] | ||
Fig 1.1. Mirroring the ingress traffic of eth0@node1 to a remote node | Fig 1.1. Mirroring the ingress traffic of eth0@node1 to a remote node | ||
Traffic mirroring involves making a copy of some specific traffic, and sending that copy to a destination place for further processing. The destination place could be one of a:# '''network device''': e.g. a physical NIC/port directly connected via cable | Traffic mirroring involves making a copy of some specific traffic, and sending that copy to a destination place for further processing. The destination place could be one of a:# '''network device''': e.g. a physical NIC/port directly connected via cable | ||
# '''remote node''': usually connected via network, as shown in the above. | # '''remote node''': usually connected via network, as shown in the above. | ||
Fig 1.1 performs mirroring merely on the ingress packets, but the egress could also be mirrored; besides, you could also filter the packets (e.g. select TCP SYN packets only) before copying them. | Fig 1.1 performs mirroring merely on the ingress packets, but the egress could also be mirrored; besides, you could also filter the packets (e.g. select TCP SYN packets only) before copying them. | ||
Another thing needs to concern: '''should we copy each interested packet exactly as it is, or just part of it?''' Well, this depends on your needs, such as,* If intended to analyze '''L7 message contents''', e.g. HTTP requests, you need to copy the intact packet, | Another thing needs to concern: '''should we copy each interested packet exactly as it is, or just part of it?''' Well, this depends on your needs, such as,* If intended to analyze '''L7 message contents''', e.g. HTTP requests, you need to copy the intact packet, | ||
* If intended to analyze '''L4 flow events''', e.g. TCP flow stats, you could just keep L2-L4 headers and drop the remaining bytes to save bandwidth and storage space, | * If intended to analyze '''L4 flow events''', e.g. TCP flow stats, you could just keep L2-L4 headers and drop the remaining bytes to save bandwidth and storage space, | ||
* If intended to analyze '''L2 flow events''', then maybe only L2 header needs to be preserved to further reduce bandwidth and storage. | * If intended to analyze '''L2 flow events''', then maybe only L2 header needs to be preserved to further reduce bandwidth and storage. | ||
== 1.2 Solutions == | == 1.2 Solutions == | ||
With the above requirements in mind, let’s break them down and try to propose a solution. | With the above requirements in mind, let’s break them down and try to propose a solution. | ||
=== Capture, filter and copy packets === | === Capture, filter and copy packets === | ||
This is the first problem we need to solve. Actually, it depends on the networking infrastructures we are running on. Such as,# With OVS stack: port mirroring function may be used, see [https://arthurchiao.art/blog/traffic-mirror-with-ovs/ Traffic Mirroring with OVS] as an example; | |||
This is the first problem we need to solve. Actually, it depends on the networking infrastructures we are running on. Such as,# With OVS stack: port mirroring function may be used, see [https://arthurchiao.art/blog/traffic-mirror-with-ovs/ Traffic Mirroring with OVS] as an example; | # With kernel stack: tools like tc may be suitable; | ||
# With kernel stack: tools like | # If none of the existing tools works for you, you may end up with writing your own mirroring stuffs based on libraries such as libpcap. | ||
# If none of the existing tools works for you, you may end up with writing your own mirroring stuffs based on libraries such as libpcap. | |||
=== Send copied packets to the destination === | === Send copied packets to the destination === | ||
This is what we care about more in this post. If mirrored packets already at hand, how should we correctly send them to the remote node? | This is what we care about more in this post. If mirrored packets already at hand, how should we correctly send them to the remote node? | ||
==== Challenges ==== | ==== Challenges ==== | ||
, as both the dsc_mac and dst_ip of the packets are destined to the local node rather than remote node, which results to,# Switches will not forward them to the remote node; even worse, they may forward them back to local node which '''leads to a loop'''; | |||
, as both the | # Even if you struggled to make those packets arrived to the destination node (e.g. via direct cables), they may still get dropped before arriving to the final network device. For example, the final device responsible for processing those packets is a virtual network device behind a physical NIC; in this case, the packets will get dropped at the NIC due to destination IP (and MAC) mismatch with the current NIC. | ||
# Even if you struggled to make those packets arrived to the destination node (e.g. via direct cables), they may still get dropped before arriving to the final network device. For example, the final device responsible for processing those packets is a virtual network device behind a physical NIC; in this case, the packets will get dropped at the NIC due to destination IP (and MAC) mismatch with the current NIC. | |||
[[Image:Bild2.png|top]] | [[Image:Bild2.png|top]] | ||
Zeile 253: | Zeile 130: | ||
==== Sending via tunneling ==== | ==== Sending via tunneling ==== | ||
One feasible way to send the mirrored traffic to the remote node is: treat each copied packet as an opaque payload, encapsulate it into another packet and send to the remote node; at the receiving side, extract this mirrored packet by decapsulating the outer header. | One feasible way to send the mirrored traffic to the remote node is: treat each copied packet as an opaque payload, encapsulate it into another packet and send to the remote node; at the receiving side, extract this mirrored packet by decapsulating the outer header. | ||
Zeile 263: | Zeile 139: | ||
== 1.3 A specific solution: tc + VxLAN == | == 1.3 A specific solution: tc + VxLAN == | ||
With all the above analysis, we are now ready to implement a sample traffic mirroring system. | With all the above analysis, we are now ready to implement a sample traffic mirroring system. | ||
And of course, we won’t make everything from scratch: Linux already provides us lots of tools that could be used to fulfill this task, including:# | And of course, we won’t make everything from scratch: Linux already provides us lots of tools that could be used to fulfill this task, including:# tc: capture and filter traffic | ||
# | # VxLAN: encapsulate mirrored packets | ||
= 2 Practice: tc + VxLAN = | = 2 Practice: tc + VxLAN = | ||
== 2.1 Set up playground == | == 2.1 Set up playground == | ||
Let’s setup our container based playground. | Let’s setup our container based playground. | ||
=== 2.1.1 Prerequisites === | === 2.1.1 Prerequisites === | ||
Make sure docker is installed before proceeding on. And this post uses docker’s default settings, which indicates,# There will be a Linux bridge docker0 created on the host, | |||
Make sure docker is installed before proceeding on. And this post uses docker’s default settings, which indicates,# There will be a Linux bridge | # docker0 servers as the gateway of the default docker network 10.0.108.0/23 (configured in /etc/docker/daemon.json) | ||
# | # When starting a container with docker run with no special networking parameters specified, it will allocate an IP address from 10.0.108.0/23, and attached the container to docker0 bridge with a veth pair. We will see this in the following. | ||
# When starting a container with | |||
Now, we’re ready to march on. | Now, we’re ready to march on. | ||
=== 2.1.2 Create two containers === | === 2.1.2 Create two containers === | ||
$ docker run -d --name ctn-1 alpine:3.11.6 sleep 60d | $ docker run -d --name ctn-1 alpine:3.11.6 sleep 60d | ||
$ docker run -d --name ctn-2 alpine:3.11.6 sleep 60d | $ docker run -d --name ctn-2 alpine:3.11.6 sleep 60d | ||
$ docker ps | grep ctn | $ docker ps | grep ctn | ||
f0322e340e03 | f0322e340e03 alpine:3.11.6 "sleep 60d" .. ctn-2 | ||
44f9a9804e77 | 44f9a9804e77 alpine:3.11.6 "sleep 60d" .. ctn-1 | ||
Get container netns: | Get container netns: | ||
Zeile 302: | Zeile 169: | ||
41243 361417 | 41243 361417 | ||
In the following, we will use | In the following, we will use nsenter -t <netns> -n <commands> to execute commands in container’s network namespace, this is equivalent to execute those commands inside the container. | ||
$ nsenter -t $NETNS1 -n ip addr | $ nsenter -t $NETNS1 -n ip addr | ||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... | ||
inet 127.0.0.1/8 scope host lo | |||
258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | 258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | ||
inet 10.0.108.2/23 brd 10.0.109.255 scope global eth0 | |||
$ nsenter -t $NETNS2 -n ip addr | $ nsenter -t $NETNS2 -n ip addr | ||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... | ||
inet 127.0.0.1/8 scope host lo | |||
273: eth0@if274: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | 273: eth0@if274: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | ||
inet 10.0.108.3/23 brd 10.0.109.255 scope global eth0 | |||
Now the network topology: | Now the network topology: | ||
Zeile 324: | Zeile 190: | ||
=== 2.1.3 Connect containers with another network === | === 2.1.3 Connect containers with another network === | ||
Now,# Add a new Linux bridge docker1 | |||
Now,# Add a new Linux bridge | # Add a new NIC to each of the container, and configure an IP for it | ||
# Add a new NIC to each of the container, and configure an IP for it | # Connect containers to docker via the new NICs. | ||
# Connect containers to | |||
The topology will be: | The topology will be: | ||
Zeile 338: | Zeile 201: | ||
==== Add bridge ==== | ==== Add bridge ==== | ||
$ ip link add docker1 type bridge # Add a new Linux bridge `docker1` | |||
$ ip link add docker1 type bridge | $ ip addr add 10.0.110.1/24 dev docker1 # Configure management IP address for this bridge | ||
$ ip addr add 10.0.110.1/24 dev docker1 | |||
$ ip link set docker1 up | $ ip link set docker1 up | ||
==== Add NIC to ctn-1 ==== | ==== Add NIC to ctn-1 ==== | ||
Add a veth pair, with one end serving as vNIC for ctn-1, and the other end attaching to docker: | |||
$ ip link add veth1 type veth peer name peer1 # Add veth pair | |||
$ ip link set peer1 netns $NETNS1 # Move one end to container's netns | |||
$ ip link add veth1 type veth peer name peer1 | |||
$ ip link set peer1 netns $NETNS1 | |||
$ nsenter -t $NETNS1 -n ip link set peer1 name eth1 | $ nsenter -t $NETNS1 -n ip link set peer1 name eth1 # Rename it to eth1 | ||
$ nsenter -t $NETNS1 -n ip addr add 10.0.110.2/23 dev eth1 # Configure IP address | $ nsenter -t $NETNS1 -n ip addr add 10.0.110.2/23 dev eth1 # Configure IP address | ||
$ nsenter -t $NETNS1 -n ip link set eth1 up | $ nsenter -t $NETNS1 -n ip link set eth1 up | ||
$ ip link set veth1 master docker1 | $ ip link set veth1 master docker1 # Attach the host side to bridge docker1 | ||
$ ip link set veth1 up | $ ip link set veth1 up | ||
==== Add NIC to ctn-2 ==== | ==== Add NIC to ctn-2 ==== | ||
Same for ctn-2: | |||
Same for | |||
$ ip link add veth2 type veth peer name peer2 | $ ip link add veth2 type veth peer name peer2 | ||
Zeile 372: | Zeile 232: | ||
=== 2.1.4 Test reachability === | === 2.1.4 Test reachability === | ||
First, take a look at the routing table inside ctn-1: | |||
First, take a look at the routing table inside | |||
$ nsenter -t $NETNS1 -n ip route | $ nsenter -t $NETNS1 -n ip route | ||
Zeile 380: | Zeile 239: | ||
10.0.110.0/23 dev eth1 proto kernel scope link src 10.0.110.2 | 10.0.110.0/23 dev eth1 proto kernel scope link src 10.0.110.2 | ||
The last route entry says that network | The last route entry says that network 10.0.110.0/23 (docker1) is reachable via eth1. Verify it: | ||
$ nsenter -t $NETNS1 -n ping 10.0.110.3 | $ nsenter -t $NETNS1 -n ping 10.0.110.3 | ||
Zeile 394: | Zeile 253: | ||
Fig 2.3. Traffic paths when pinging each IP address of ctn-2 from ctn-1 | Fig 2.3. Traffic paths when pinging each IP address of ctn-2 from ctn-1 | ||
As shown in the above picture, we can now reach | As shown in the above picture, we can now reach ctn-2 via a distinct path (via eth1), which is independent from the default one (via eth0). We will send our mirrored packets with this path in the next. | ||
== 2.2 Set up VxLAN tunnel == | == 2.2 Set up VxLAN tunnel == | ||
Now we are ready to set up a VxLAN tunnel on top of the eth1 NICs between ctn-1 and ctn-2 . | |||
Now we are ready to set up a VxLAN tunnel on top of the | |||
=== 2.2.1 Setup VxLAN tunnel on ctn-1 === | === 2.2.1 Setup VxLAN tunnel on ctn-1 === | ||
$ nsenter -t $NETNS1 -n ip link add vxlan0 type vxlan id 100 local 10.0.110.2 remote 10.0.110.3 dev eth1 dstport 4789 | $ nsenter -t $NETNS1 -n ip link add vxlan0 type vxlan id 100 local 10.0.110.2 remote 10.0.110.3 dev eth1 dstport 4789 | ||
$ nsenter -t $NETNS1 -n ip link set vxlan0 up | $ nsenter -t $NETNS1 -n ip link set vxlan0 up | ||
Zeile 407: | Zeile 264: | ||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... | ||
258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | 258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | ||
inet 10.0.108.2/23 brd 10.0.109.255 scope global eth0 | |||
9: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc ... | 9: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc ... | ||
link/ether 4e:b6:9d:1e:ac:bd brd ff:ff:ff:ff:ff:ff | |||
277: eth1@if278: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | 277: eth1@if278: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... | ||
inet 10.0.110.2/23 scope global eth1 | |||
=== 2.2.2 Setup VxLAN tunnel on ctn-2 === | === 2.2.2 Setup VxLAN tunnel on ctn-2 === | ||
$ nsenter -t $NETNS2 -n ip link add vxlan0 type vxlan id 100 local 10.0.110.3 remote 10.0.110.2 dev eth1 dstport 4789 | $ nsenter -t $NETNS2 -n ip link add vxlan0 type vxlan id 100 local 10.0.110.3 remote 10.0.110.2 dev eth1 dstport 4789 | ||
$ nsenter -t $NETNS2 -n ip link set vxlan0 up | $ nsenter -t $NETNS2 -n ip link set vxlan0 up | ||
=== 2.2.3 Topology === | === 2.2.3 Topology === | ||
Now the topo: | Now the topo: | ||
Zeile 426: | Zeile 280: | ||
Fig 2.4. Network topology after adding a VxLAN tunnel | Fig 2.4. Network topology after adding a VxLAN tunnel | ||
Traffic path between | Traffic path between vxlan0@ctn-1 and vxlan0@ctn-2:* : vxlan0@ctn-1 -> tunnel -> vxlan0@ctn-2 | ||
* : | * : vxlan0@ctn-1 -> eth1@ctn-1 -> veth1 -> docker1 -> veth2 -> eth1@ctn-2 -> vxlan0@ctn-2 | ||
== 2.3 Filter & mirror traffic with tc == | == 2.3 Filter & mirror traffic with tc == | ||
The Linux kernel ships with an excellent traffic control (tc) subsystem, which can filter and shape the bandwidth of network devices by queuing the packets on the devices first then transmitting them by specific algorithms. These algorithms are called qdiscs (queuing disciplines). | |||
The Linux kernel ships with an excellent traffic control (tc) subsystem, which can filter and shape the bandwidth of network devices by queuing the packets on the devices first then transmitting them by specific algorithms. These algorithms are called | |||
Qdiscs in the kernel [5]: | Qdiscs in the kernel [5]: | ||
Userspace programs | |||
^ | |||
| | |||
+---------------+-----------------------------------------+ | |||
| Y | | |||
| -------> IP Stack | | |||
| | | | | |||
| | Y | | |||
| | / ----------> Forwarding -> | | |||
| ^ / | | | |||
| | | | | |||
| ^ Y /-qdisc1-\ | | |||
| | Egress /--qdisc2--\ | | |||
--->->Ingress Classifier ---qdisc3---- | -> | |||
| Qdisc \__qdisc4__/ | | |||
| \-qdiscN_/ | | |||
+----------------------------------------------------------+ | |||
Fig 2.5. tc qdisc in the kernel, credit to Jamal Hadi Salim [5] | |||
It needs to be noted that, '''traffic shaping only works on the egress side''' - as you can only control the transmitting behavior of yourself - ingress bandwidth is determined by other nodes, which are out of our control for most of the time. But the good news is, there is still a qdisc for ingress, which is used for filtering (and dropping if you like) the ingress packets, as shown in the above picture. | It needs to be noted that, '''traffic shaping only works on the egress side''' - as you can only control the transmitting behavior of yourself - ingress bandwidth is determined by other nodes, which are out of our control for most of the time. But the good news is, there is still a qdisc for ingress, which is used for filtering (and dropping if you like) the ingress packets, as shown in the above picture. | ||
Zeile 463: | Zeile 313: | ||
=== 2.3.1 Add the ingress qdisc === | === 2.3.1 Add the ingress qdisc === | ||
Add a qdisc in eth0@ctn-1’s ingress path: | |||
Add a qdisc in | |||
$ nsenter -t $NETNS1 -n tc qdisc add dev eth0 handle ffff: ingress | $ nsenter -t $NETNS1 -n tc qdisc add dev eth0 handle ffff: ingress | ||
Zeile 474: | Zeile 323: | ||
Sent 4284 bytes 68 pkt (dropped 0, overlimits 0 requeues 0) | Sent 4284 bytes 68 pkt (dropped 0, overlimits 0 requeues 0) | ||
backlog 0b 0p requeues 0 | backlog 0b 0p requeues 0 | ||
All ingress packets on eth0 will be sent to this qdisc. | |||
All ingress packets on | |||
=== 2.3.2 Add a packet filter to qdisc === | === 2.3.2 Add a packet filter to qdisc === | ||
Add a filter to : | Add a filter to : | ||
# match all packets, mirrored to vxlan0 device | |||
$ nsenter -t $NETNS1 -n tc filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 action mirred egress mirror dev vxlan0 | $ nsenter -t $NETNS1 -n tc filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 action mirred egress mirror dev vxlan0 | ||
Check the filter: | Check the filter: | ||
# -s: show statistics | |||
# -p: pretty format output for filter info | |||
$ nsenter -t $NETNS1 -n tc -s -p filter ls dev eth0 parent ffff: | $ nsenter -t $NETNS1 -n tc -s -p filter ls dev eth0 parent ffff: | ||
filter protocol ip pref 49152 u32 | filter protocol ip pref 49152 u32 | ||
filter protocol ip pref 49152 u32 fh 800: ht divisor 1 | filter protocol ip pref 49152 u32 fh 800: ht divisor 1 | ||
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw | filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw (rule hit 0 success 0) | ||
match IP protocol 1 (success 0 ) | |||
action order 1: mirred (Egress Mirror to device vxlan0) pipe | |||
index 1 ref 1 bind 1 installed 18 sec used 18 sec | |||
Action statistics: | |||
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) | |||
backlog 0b 0p requeues 0 | |||
[[Image:Bild8.png|top]] | [[Image:Bild8.png|top]] | ||
Zeile 504: | Zeile 350: | ||
=== 2.3.3 Generate traffic and test mirroring === | === 2.3.3 Generate traffic and test mirroring === | ||
Generate test traffic: ping eth0@ctn-1 from host, | |||
Generate test traffic: ping | |||
$ ping 10.0.108.2 | $ ping 10.0.108.2 | ||
Zeile 513: | Zeile 358: | ||
... | ... | ||
Capture the traffic at | Capture the traffic at eth1@ctn-2: | ||
$ nsenter -t $NETNS2 -n tcpdump -nn -i eth1 | $ nsenter -t $NETNS2 -n tcpdump -nn -i eth1 | ||
Zeile 529: | Zeile 374: | ||
== 2.4 Clean up == | == 2.4 Clean up == | ||
Remove filter and qdisc: | Remove filter and qdisc: | ||
Zeile 553: | Zeile 397: | ||
= 3 Remarks = | = 3 Remarks = | ||
== 3.1 MTU == | == 3.1 MTU == | ||
Encapsulation will add additional headers to the original packets, so eth1 should have a large MTU than eth0. Otherwise, big packets will not be mirrored but dropped. | |||
Encapsulation will add additional headers to the original packets, so | |||
== 3.2 Filter specific packets == | == 3.2 Filter specific packets == | ||
You could choose what type of packets to filter, e.g. ICMP packets, UDP packet, refer to [2, 5] for more examples. | You could choose what type of packets to filter, e.g. ICMP packets, UDP packet, refer to [2, 5] for more examples. | ||
== 3.3 Other tunneling alternatives == | == 3.3 Other tunneling alternatives == | ||
As has been said, VxLAN is only one of the encapsulation formats. You could try other formats, e.g. GRE [2]. | As has been said, VxLAN is only one of the encapsulation formats. You could try other formats, e.g. GRE [2]. | ||
== 3.4 Performance == | == 3.4 Performance == | ||
Software encapsulation/decapsulation takes nontrival amount of CPU resource, but this is beyond the scope of this post. | Software encapsulation/decapsulation takes nontrival amount of CPU resource, but this is beyond the scope of this post. | ||
= References = | = References = | ||
# [https://en.wikipedia.org/wiki/Port_mirroring Wikipedia: Port Mirroring] | |||
# [https://en.wikipedia.org/wiki/Port_mirroring Wikipedia: Port Mirroring] | # [https://medium.com/swlh/traffic-mirroring-with-linux-tc-df4d36116119 Traffic Mirroring with Linux Tc] | ||
# [https://medium.com/swlh/traffic-mirroring-with-linux-tc-df4d36116119 Traffic Mirroring with Linux Tc] | # [https://developers.redhat.com/blog/2019/05/17/an-introduction-to-linux-virtual-interfaces-tunnels/ An introduction to Linux virtual interfaces: Tunnels] | ||
# [https://developers.redhat.com/blog/2019/05/17/an-introduction-to-linux-virtual-interfaces-tunnels/ An introduction to Linux virtual interfaces: Tunnels] | # [https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/ Introduction to Linux interfaces for virtual networking] | ||
# [https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/ Introduction to Linux interfaces for virtual networking] | # [https://lartc.org/howto/lartc.qdisc.html Chapter 9: Queueing Disciplines for Bandwidth Management], Linux Advanced Routing & Traffic Control HOWTO | ||
# [https://lartc.org/howto/lartc.qdisc.html Chapter 9: Queueing Disciplines for Bandwidth Management], Linux Advanced Routing & Traffic Control HOWTO | # [https://arthurchiao.art/blog/traffic-mirror-with-ovs/ Traffic Mirroring with OVS] | ||
# [https://arthurchiao.art/blog/traffic-mirror-with-ovs/ Traffic Mirroring with OVS] | |||
= Appendix: Topology discovery of docker bridge network = | = Appendix: Topology discovery of docker bridge network = | ||
This appendix explores how containers connect to bridge docker0 (how the following picture comes): | |||
This appendix explores how containers connect to bridge | |||
[[Image:Bild10.png|top]] | [[Image:Bild10.png|top]] | ||
Zeile 589: | Zeile 424: | ||
Fig 2.1. Default network topology of docker containers | Fig 2.1. Default network topology of docker containers | ||
First, get the detailed information about network device | First, get the detailed information about network device docker0: | ||
# -d/--details | |||
$ ip -d link show docker0 | $ ip -d link show docker0 | ||
9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default | 9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default | ||
link/ether 02:42:bd:e4:7a:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 | |||
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 | |||
vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:bd:e4:7a:85 ... | |||
in the above output,* bridge indicates this is a '''bridge''' type device | |||
* parameters after bridge (foward_delay, hello_time, etc) are its configuration parameters | |||
In the same way, get the defailed information about eth0 inside ctn-1: | |||
In the same way, get the defailed information about | |||
$ nsenter-ctn ctn-1 -n ip -d link show dev eth0 | $ nsenter-ctn ctn-1 -n ip -d link show dev eth0 | ||
258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 | 258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 | ||
veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 | |||
It says,* eth0 is one end of a veth pair device, with interface ID 258; note that this index is unique within the '''node''' (instead of the container) | |||
It says,* | * the other end of the veth pair has an interface index 259 | ||
* the other end of the veth pair has an interface index | |||
Find the device with index | Find the device with index 259: | ||
$ ip link | grep 259 | $ ip link | grep 259 | ||
Zeile 623: | Zeile 452: | ||
$ ip -d link show veth159e062 | $ ip -d link show veth159e062 | ||
259: veth159e062@if258: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 ... | 259: veth159e062@if258: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 ... | ||
veth | |||
bridge_slave state forwarding priority 32 ...* bridge_slave means this device is attached to a bridge | |||
* | * master docker0 further gives us the bridge name: docker0 | ||
At this point, we know that ctn-1 connects to docker0 with a path eth0 -> veth159e062 -> docker0. | |||
Repeat the same procedure for ctn-2, we will just get the picture in the above. | |||
== Port mirroring == | |||
BISDN Linux supports the configuration of mirror ports. Mirrored ports replicate incoming/outgoing traffic, according to their configuration flag. The supported port mirroring flags are the following:* OFDPA_MIRROR_PORT_TYPE_INGRESS: Only ingress traffic is mirrored | |||
* OFDPA_MIRROR_PORT_TYPE_EGRESS: Only egress traffic is mirrored | |||
* OFDPA_MIRROR_PORT_TYPE_INGRESS_EGRESS: Both ingress and egress traffic are mirrored | |||
Please note that when mirroring ports with different maximum link speeds (e.g. a 10G port mirrored to a 1G port), the highest common link speed (1G for the aforementioned example) will be used for both ports. | Please note that when mirroring ports with different maximum link speeds (e.g. a 10G port mirrored to a 1G port), the highest common link speed (1G for the aforementioned example) will be used for both ports. | ||
Zeile 648: | Zeile 471: | ||
[[Image:Bild11.png|top|alt="Port mirroring example"]] | [[Image:Bild11.png|top|alt="Port mirroring example"]] | ||
=== Adding mirror ports | === Adding mirror ports === | ||
Add port 8 as a mirror port: | Add port 8 as a mirror port: | ||
grpc_cli call <IP>:50051 ofdpaMirrorPortCreate "port_num: 8" | grpc_cli call <IP>:50051 ofdpaMirrorPortCreate "port_num: 8" | ||
Where | Where <IP> is the IP of the whitebox switch (localhost when logged in locally to the switch). | ||
Then, set port 2 as the mirror source and configure the port type to only mirror ingress traffic: | Then, set port 2 as the mirror source and configure the port type to only mirror ingress traffic: | ||
Zeile 660: | Zeile 482: | ||
grpc_cli call <IP>:50051 ofdpaMirrorSourcePortAdd "mirror_dst_port_num: { port_num: 8 }, mirror_src_port_num: { port_num: 2 }, config: { ofdpa_mirror_port_type: OFDPA_MIRROR_PORT_TYPE_INGRESS}" | grpc_cli call <IP>:50051 ofdpaMirrorSourcePortAdd "mirror_dst_port_num: { port_num: 8 }, mirror_src_port_num: { port_num: 2 }, config: { ofdpa_mirror_port_type: OFDPA_MIRROR_PORT_TYPE_INGRESS}" | ||
=== Verifying mirror port configuration | === Verifying mirror port configuration === | ||
See the mirror port configuration by running the following command on the whitebox switch: | See the mirror port configuration by running the following command on the whitebox switch: | ||
client_mirror_port_dump | client_mirror_port_dump | ||
Mirrored traffic cannot be captured on the switch mirror ports. Hence, to verify that traffic is being mirrored, we need to capture traffic on the server port that is connected to the mirror switch port. Within the example from the figure, the following command should be executed on | Mirrored traffic cannot be captured on the switch mirror ports. Hence, to verify that traffic is being mirrored, we need to capture traffic on the server port that is connected to the mirror switch port. Within the example from the figure, the following command should be executed on server2: | ||
sudo tcpdump -i eno8 | sudo tcpdump -i eno8 | ||
=== Deleting mirror ports | === Deleting mirror ports === | ||
The port mirror configuration can be deleted with the following commands: | The port mirror configuration can be deleted with the following commands: | ||
Zeile 677: | Zeile 497: | ||
grpc_cli call <IP>:50051 ofdpaMirrorPortDelete "port_num: 8" | grpc_cli call <IP>:50051 ofdpaMirrorPortDelete "port_num: 8" | ||
=== Port mirroring of bonded interfaces | === Port mirroring of bonded interfaces === | ||
Port mirroring works with physical ports, not logical ports, so to mirror the full traffic of a bonded interface all the individual bond members need to be mirrored. These can be either mirrored 1:1 to additional ports, or all bond members mirrored to one port. | Port mirroring works with physical ports, not logical ports, so to mirror the full traffic of a bonded interface all the individual bond members need to be mirrored. These can be either mirrored 1:1 to additional ports, or all bond members mirrored to one port. | ||
= Chapter 14. Port mirroring = | = Chapter 14. Port mirroring = | ||
Network administrators can use port mirroring to replicate inbound and outbound network traffic being communicated from one network device to another. Administrators use port mirroring to monitor network traffic and collect network data to: * Debug networking issues and tune the network flow | |||
Network administrators can use port mirroring to replicate inbound and outbound network traffic being communicated from one network device to another. Administrators use port mirroring to monitor network traffic and collect network data to: * Debug networking issues and tune the network flow | * Inspect and analyze the network traffic to troubleshoot networking problems | ||
* Inspect and analyze the network traffic to troubleshoot networking problems | * Detect an intrusion | ||
* Detect an intrusion | |||
== 14.1. Mirroring a network interface using nmcli == | == 14.1. Mirroring a network interface using nmcli == | ||
You can configure port mirroring using NetworkManager. The following procedure mirrors the network traffic from enp1s0 to enp7s0 by adding Traffic Control (tc) rules and filters to the enp1s0 network interface. | |||
'''Prerequisites'''* A network interface to mirror the network traffic to. | |||
'''Prerequisites'''* A network interface to mirror the network traffic to. | |||
'''Procedure'''# Add a network connection profile that you want to mirror the network traffic from: <br/># '''nmcli connection add type ethernet ifname ''enp1s0'' con-name ''enp1s0'' autoconnect ''no''''' | |||
# Attach a prio qdisc to enp1s0 for the egress (outgoing) traffic with the 10: handle: <br/># '''nmcli connection modify ''enp1s0'' +tc.qdisc "root prio handle 10:"'''<br/>The prio qdisc attached without children allows attaching filters. | |||
# Add a qdisc for the ingress traffic, with the ffff: handle: <br/># '''nmcli connection modify ''enp1s0'' +tc.qdisc "ingress handle ffff:"''' | |||
# Add the following filters to match packets on the ingress and egress qdiscs, and to mirror them to enp7s0: <br/># '''nmcli connection modify ''enp1s0'' +tc.tfilter "parent ffff: matchall action mirred egress mirror dev ''enp7s0''"'''<br/> | |||
# '''nmcli connection modify ''enp1s0'' +tc.tfilter "parent 10: matchall action mirred egress mirror dev ''enp7s0''"'''<br/>The matchall filter matches all packets, and the mirred action redirects packets to destination. | |||
# Activate the connection: <br/># '''nmcli connection up ''enp1s0''''' | |||
'''Verification steps'''# Install the tcpdump utility: <br/># '''yum install tcpdump''' | |||
# Display the traffic mirrored on the target device (enp7s0): <br/># '''tcpdump -i ''enp7s0''''' | |||
'''Additional resources'''* [https://access.redhat.com/solutions/8787 How to capture network packets using tcpdump] | '''Additional resources'''* [https://access.redhat.com/solutions/8787 How to capture network packets using tcpdump] |
Version vom 10. März 2023, 16:52 Uhr
Linux Port Mirroring
Sometimes a span/mirror port isn’t available on a switch when you need one. However, if you just need to monitor traffic from/to a Linux server, you can have the Linux server mirror it’s own traffic toward a collector. You just need a spare NIC on the Linux system.
For example, if you want to mirror all traffic going in/out of eth0, and send it out eth1 toward a collector, you can do that with the ‘tc’ subsystem. Some examples of this exist online. However, what they don’t tell you is that if eth1 goes down (unplugged, etc), that results in eth0 blocking all traffic — not good! I came up with a workaround for that problem using bridge and dummy interfaces. It’s kludgy, but it works!
See below for all the required config files / scripts. (don’t forget to chmod the scripts so root can run them!)
/etc/network/interfaces
# The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet static address 192.168.1.123 netmask 255.255.255.20 gateway 192.168.1.254 dns-nameservers 8.8.8.8 # physical mirror port toward collector auto eth1 iface eth1 inet static address 127.1.1.1 netmask 255.255.255.255 up ip addr del 127.1.1.1/32 dev eth1;: # always-up mirror port toward dev null auto dummy0 iface dummy0 inet static address 127.4.4.4 netmask 255.255.255.255 pre-up modprobe dummy;: up ip addr del 127.4.4.4/32 dev dummy0;: # bridge of eth1 and dummy0 # this is so that it always stays up, even if eth1 is down auto br0 iface br0 inet static bridge_ports eth1 dummy0 address 127.5.5.5 netmask 255.255.255.255 up ip addr del 127.5.5.5/32 dev br0;: post-up /etc/network/mirror-up.sh;: pre-down /etc/network/mirror-down.sh;:
/etc/network/mirror-up.sh
#!/bin/sh # ALK 2014-10-23 # Send all 'source_if' traffic (ingress & egress) to collector box on 'dest_if' # Normally called by boot process or ifup. i.e. # --/etc/network/interfaces-- # iface eth0 inet static # address X.X.X.X # netmask Y.Y.Y.Y # post-up /etc/network/mirror-up.sh;: source_if=eth0 dest_if=br0 # enable the destination port ifconfig $dest_if up;: # mirror ingress traffic tc qdisc add dev $source_if ingress;: tc filter add dev $source_if parent ffff: \ protocol all \ u32 match u8 0 0 \ action mirred egress mirror dev $dest_if;: # mirror egress traffic tc qdisc add dev $source_if handle 1: root prio;: tc filter add dev $source_if parent 1: \ protocol all \ u32 match u8 0 0 \ action mirred egress mirror dev $dest_if;: /etc/network/mirror-down.sh #!/bin/sh # ALK 2014-10-23 # De-provision mirroring config (See mirror-up.sh for provisioning) # Normally called by boot process or ifdown. i.e. # --/etc/network/interfaces-- # iface eth0 inet static # address X.X.X.X # netmask Y.Y.Y.Y # pre-down /etc/network/mirror-down.sh;: source_if=eth0 # de-provision ingress mirroring tc qdisc del dev $source_if ingress;: # de-provisoin egress mirroring tc qdisc del dev $source_if root;:
Networking Tools and tagged bash, ifconfig, linux, mirror, monitor, span, tc on .
Traffic Mirroring: Theory and Practice (with tc and Tunneling)
Traffic mirroring (also called port mirroring in some cases) is a technique to send a copy of the network traffic to a backend system. It is commonly used by network developers and administrators/operators to monitor network traffic or diagnose problems. Many systems rely on traffic mirroring function, such as intrusion detection system (IDS), passive probe, real user monitoring (RUM), traffic replay system, etc. [1]
Traditional traffic mirroring solutions are based on hardware switches, and are vendor-specific, which often found to be expensive, hard to use and difficult to evolve. With the raising of network virtualization techniques, software-based and vendor-agnostic solutions emerge, which are more flexible and easier to upgrade.
This post first discusses the technical requirements for traffic mirroring, then investigates a specific solution based on tc and tunneling. To make this a hands-on guide, we will set up a container-based virtual environment, but it also fits the physical (servers and networks) environments.
1 Theory
1.1 Technical requirements
Fig 1.1. Mirroring the ingress traffic of eth0@node1 to a remote node
Traffic mirroring involves making a copy of some specific traffic, and sending that copy to a destination place for further processing. The destination place could be one of a:# network device: e.g. a physical NIC/port directly connected via cable
- remote node: usually connected via network, as shown in the above.
Fig 1.1 performs mirroring merely on the ingress packets, but the egress could also be mirrored; besides, you could also filter the packets (e.g. select TCP SYN packets only) before copying them.
Another thing needs to concern: should we copy each interested packet exactly as it is, or just part of it? Well, this depends on your needs, such as,* If intended to analyze L7 message contents, e.g. HTTP requests, you need to copy the intact packet,
- If intended to analyze L4 flow events, e.g. TCP flow stats, you could just keep L2-L4 headers and drop the remaining bytes to save bandwidth and storage space,
- If intended to analyze L2 flow events, then maybe only L2 header needs to be preserved to further reduce bandwidth and storage.
1.2 Solutions
With the above requirements in mind, let’s break them down and try to propose a solution.
Capture, filter and copy packets
This is the first problem we need to solve. Actually, it depends on the networking infrastructures we are running on. Such as,# With OVS stack: port mirroring function may be used, see Traffic Mirroring with OVS as an example;
- With kernel stack: tools like tc may be suitable;
- If none of the existing tools works for you, you may end up with writing your own mirroring stuffs based on libraries such as libpcap.
Send copied packets to the destination
This is what we care about more in this post. If mirrored packets already at hand, how should we correctly send them to the remote node?
Challenges
, as both the dsc_mac and dst_ip of the packets are destined to the local node rather than remote node, which results to,# Switches will not forward them to the remote node; even worse, they may forward them back to local node which leads to a loop;
- Even if you struggled to make those packets arrived to the destination node (e.g. via direct cables), they may still get dropped before arriving to the final network device. For example, the final device responsible for processing those packets is a virtual network device behind a physical NIC; in this case, the packets will get dropped at the NIC due to destination IP (and MAC) mismatch with the current NIC.
Fig 1.2. Mirrored traffic won't arrive to the destination node if directly them put on the wire
Also, you couldn’t send them there by , as altered packets are not “mirrored packets” anymore - which will be useless to the receiving side (e.g. IDS).
Then, how could we send them to the remote node without changing them? Here comes tunneling.
Sending via tunneling
One feasible way to send the mirrored traffic to the remote node is: treat each copied packet as an opaque payload, encapsulate it into another packet and send to the remote node; at the receiving side, extract this mirrored packet by decapsulating the outer header.
There are many encapsulation formats to achieve this purpose, such as VxLAN, which puts the original packet into a new UDP packet, as shown below:
Fig 1.3. VxLAN header, with the "Original L2 Frame" field stores a mirrored packet. Image from @ethansmithron
1.3 A specific solution: tc + VxLAN
With all the above analysis, we are now ready to implement a sample traffic mirroring system.
And of course, we won’t make everything from scratch: Linux already provides us lots of tools that could be used to fulfill this task, including:# tc: capture and filter traffic
- VxLAN: encapsulate mirrored packets
2 Practice: tc + VxLAN
2.1 Set up playground
Let’s setup our container based playground.
2.1.1 Prerequisites
Make sure docker is installed before proceeding on. And this post uses docker’s default settings, which indicates,# There will be a Linux bridge docker0 created on the host,
- docker0 servers as the gateway of the default docker network 10.0.108.0/23 (configured in /etc/docker/daemon.json)
- When starting a container with docker run with no special networking parameters specified, it will allocate an IP address from 10.0.108.0/23, and attached the container to docker0 bridge with a veth pair. We will see this in the following.
Now, we’re ready to march on.
2.1.2 Create two containers
$ docker run -d --name ctn-1 alpine:3.11.6 sleep 60d $ docker run -d --name ctn-2 alpine:3.11.6 sleep 60d $ docker ps | grep ctn f0322e340e03 alpine:3.11.6 "sleep 60d" .. ctn-2 44f9a9804e77 alpine:3.11.6 "sleep 60d" .. ctn-1
Get container netns:
$ NETNS1=$(docker inspect --format "Vorlage:.State.Pid" ctn-1) $ NETNS2=$(docker inspect --format "Vorlage:.State.Pid" ctn-2) $ echo $NETNS1 $NETNS2 41243 361417
In the following, we will use nsenter -t <netns> -n <commands> to execute commands in container’s network namespace, this is equivalent to execute those commands inside the container.
$ nsenter -t $NETNS1 -n ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ...
inet 127.0.0.1/8 scope host lo
258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
inet 10.0.108.2/23 brd 10.0.109.255 scope global eth0
$ nsenter -t $NETNS2 -n ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ...
inet 127.0.0.1/8 scope host lo
273: eth0@if274: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
inet 10.0.108.3/23 brd 10.0.109.255 scope global eth0
Now the network topology:
Fig 2.1. Default network topology of docker containers
See Appendix if you are curious about how this topology comes to place.
2.1.3 Connect containers with another network
Now,# Add a new Linux bridge docker1
- Add a new NIC to each of the container, and configure an IP for it
- Connect containers to docker via the new NICs.
The topology will be:
Fig 2.2. Network topology after adding a bridge and two vNICs to containers
Add bridge
$ ip link add docker1 type bridge # Add a new Linux bridge `docker1` $ ip addr add 10.0.110.1/24 dev docker1 # Configure management IP address for this bridge $ ip link set docker1 up
Add NIC to ctn-1
Add a veth pair, with one end serving as vNIC for ctn-1, and the other end attaching to docker:
$ ip link add veth1 type veth peer name peer1 # Add veth pair $ ip link set peer1 netns $NETNS1 # Move one end to container's netns
$ nsenter -t $NETNS1 -n ip link set peer1 name eth1 # Rename it to eth1 $ nsenter -t $NETNS1 -n ip addr add 10.0.110.2/23 dev eth1 # Configure IP address $ nsenter -t $NETNS1 -n ip link set eth1 up
$ ip link set veth1 master docker1 # Attach the host side to bridge docker1 $ ip link set veth1 up
Add NIC to ctn-2
Same for ctn-2:
$ ip link add veth2 type veth peer name peer2 $ ip link set peer2 netns $NETNS2
$ nsenter -t $NETNS2 -n ip link set peer2 name eth1 $ nsenter -t $NETNS2 -n ip addr add 10.0.110.3/23 dev eth1 $ nsenter -t $NETNS2 -n ip link set eth1 up
$ ip link set veth1 master docker1 $ ip link set veth2 up
2.1.4 Test reachability
First, take a look at the routing table inside ctn-1:
$ nsenter -t $NETNS1 -n ip route default via 10.0.108.1 dev eth0 10.0.108.0/23 dev eth0 proto kernel scope link src 10.0.108.2 10.0.110.0/23 dev eth1 proto kernel scope link src 10.0.110.2
The last route entry says that network 10.0.110.0/23 (docker1) is reachable via eth1. Verify it:
$ nsenter -t $NETNS1 -n ping 10.0.110.3 64 bytes from 10.0.110.3: icmp_seq=1 ttl=64 time=0.053 ms ^C
$ nsenter -t $NETNS1 -n ping 10.0.108.3 64 bytes from 10.0.108.3: icmp_seq=1 ttl=64 time=0.053 ms ^C
Fig 2.3. Traffic paths when pinging each IP address of ctn-2 from ctn-1
As shown in the above picture, we can now reach ctn-2 via a distinct path (via eth1), which is independent from the default one (via eth0). We will send our mirrored packets with this path in the next.
2.2 Set up VxLAN tunnel
Now we are ready to set up a VxLAN tunnel on top of the eth1 NICs between ctn-1 and ctn-2 .
2.2.1 Setup VxLAN tunnel on ctn-1
$ nsenter -t $NETNS1 -n ip link add vxlan0 type vxlan id 100 local 10.0.110.2 remote 10.0.110.3 dev eth1 dstport 4789 $ nsenter -t $NETNS1 -n ip link set vxlan0 up $ nsenter -t $NETNS1 -n ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 ... 258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
inet 10.0.108.2/23 brd 10.0.109.255 scope global eth0
9: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc ...
link/ether 4e:b6:9d:1e:ac:bd brd ff:ff:ff:ff:ff:ff
277: eth1@if278: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
inet 10.0.110.2/23 scope global eth1
2.2.2 Setup VxLAN tunnel on ctn-2
$ nsenter -t $NETNS2 -n ip link add vxlan0 type vxlan id 100 local 10.0.110.3 remote 10.0.110.2 dev eth1 dstport 4789 $ nsenter -t $NETNS2 -n ip link set vxlan0 up
2.2.3 Topology
Now the topo:
Fig 2.4. Network topology after adding a VxLAN tunnel
Traffic path between vxlan0@ctn-1 and vxlan0@ctn-2:* : vxlan0@ctn-1 -> tunnel -> vxlan0@ctn-2
- : vxlan0@ctn-1 -> eth1@ctn-1 -> veth1 -> docker1 -> veth2 -> eth1@ctn-2 -> vxlan0@ctn-2
2.3 Filter & mirror traffic with tc
The Linux kernel ships with an excellent traffic control (tc) subsystem, which can filter and shape the bandwidth of network devices by queuing the packets on the devices first then transmitting them by specific algorithms. These algorithms are called qdiscs (queuing disciplines).
Qdiscs in the kernel [5]:
Userspace programs ^ | +---------------+-----------------------------------------+ | Y | | -------> IP Stack | | | | | | | Y | | | / ----------> Forwarding -> | | ^ / | | | | | | | ^ Y /-qdisc1-\ | | | Egress /--qdisc2--\ | --->->Ingress Classifier ---qdisc3---- | -> | Qdisc \__qdisc4__/ | | \-qdiscN_/ | +----------------------------------------------------------+ Fig 2.5. tc qdisc in the kernel, credit to Jamal Hadi Salim [5]
It needs to be noted that, traffic shaping only works on the egress side - as you can only control the transmitting behavior of yourself - ingress bandwidth is determined by other nodes, which are out of our control for most of the time. But the good news is, there is still a qdisc for ingress, which is used for filtering (and dropping if you like) the ingress packets, as shown in the above picture.
For more information on Linux’s traffic control subsystem, refer to [5,2] and this translation if you can read Chinese.
For simplicity, we’ll only mirror the ingress packets in the next.
2.3.1 Add the ingress qdisc
Add a qdisc in eth0@ctn-1’s ingress path:
$ nsenter -t $NETNS1 -n tc qdisc add dev eth0 handle ffff: ingress $ nsenter -t $NETNS1 -n tc -s qdisc ls dev eth0 qdisc noqueue 0: root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 4284 bytes 68 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
All ingress packets on eth0 will be sent to this qdisc.
2.3.2 Add a packet filter to qdisc
Add a filter to :
- match all packets, mirrored to vxlan0 device
$ nsenter -t $NETNS1 -n tc filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 action mirred egress mirror dev vxlan0
Check the filter:
- -s: show statistics
- -p: pretty format output for filter info
$ nsenter -t $NETNS1 -n tc -s -p filter ls dev eth0 parent ffff: filter protocol ip pref 49152 u32 filter protocol ip pref 49152 u32 fh 800: ht divisor 1 filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw (rule hit 0 success 0)
match IP protocol 1 (success 0 ) action order 1: mirred (Egress Mirror to device vxlan0) pipe index 1 ref 1 bind 1 installed 18 sec used 18 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
Fig 2.6. Network topology after adding tc qdisc & filter
2.3.3 Generate traffic and test mirroring
Generate test traffic: ping eth0@ctn-1 from host,
$ ping 10.0.108.2 PING 10.0.108.2 (10.0.108.2) 56(84) bytes of data. 64 bytes from 10.0.108.2: icmp_seq=1 ttl=64 time=0.063 ms 64 bytes from 10.0.108.2: icmp_seq=2 ttl=64 time=0.053 ms ...
Capture the traffic at eth1@ctn-2:
$ nsenter -t $NETNS2 -n tcpdump -nn -i eth1 19:03:20.767998 IP 10.0.110.2.35710 > 10.0.110.3.4789: VXLAN, flags [I] (0x08), vni 100 IP 10.0.108.1 > 10.0.108.2: ICMP echo request, id 30513, seq 1, length 64
19:03:21.767482 IP 10.0.110.2.35710 > 10.0.110.3.4789: VXLAN, flags [I] (0x08), vni 100 IP 10.0.108.1 > 10.0.108.2: ICMP echo request, id 30513, seq 2, length 64
As the output shows, the ingress traffic has been mirrored to this NIC using VxLAN encapsulation. Traffic path:
Fig 2.7. Traffic path of the mirror test
2.4 Clean up
Remove filter and qdisc:
$ nsenter -t $NETNS1 -n tc filter del dev eth0 ingress $ nsenter -t $NETNS1 -n tc qdisc del dev eth0 ingress
Remove network devices:
$ nsenter -t $NETNS1 -n ip link del vxlan0 $ nsenter -t $NETNS2 -n ip link del vxlan0
$ ip link del veth1 $ ip link del veth2
Remove bridge:
$ ip link del docker1
Remove containers:
$ docker rm -f ctn-1 $ docker rm -f ctn-2
3 Remarks
3.1 MTU
Encapsulation will add additional headers to the original packets, so eth1 should have a large MTU than eth0. Otherwise, big packets will not be mirrored but dropped.
3.2 Filter specific packets
You could choose what type of packets to filter, e.g. ICMP packets, UDP packet, refer to [2, 5] for more examples.
3.3 Other tunneling alternatives
As has been said, VxLAN is only one of the encapsulation formats. You could try other formats, e.g. GRE [2].
3.4 Performance
Software encapsulation/decapsulation takes nontrival amount of CPU resource, but this is beyond the scope of this post.
References
- Wikipedia: Port Mirroring
- Traffic Mirroring with Linux Tc
- An introduction to Linux virtual interfaces: Tunnels
- Introduction to Linux interfaces for virtual networking
- Chapter 9: Queueing Disciplines for Bandwidth Management, Linux Advanced Routing & Traffic Control HOWTO
- Traffic Mirroring with OVS
Appendix: Topology discovery of docker bridge network
This appendix explores how containers connect to bridge docker0 (how the following picture comes):
Fig 2.1. Default network topology of docker containers
First, get the detailed information about network device docker0:
- -d/--details
$ ip -d link show docker0 9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:bd:e4:7a:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.2:42:bd:e4:7a:85 ...
in the above output,* bridge indicates this is a bridge type device
- parameters after bridge (foward_delay, hello_time, etc) are its configuration parameters
In the same way, get the defailed information about eth0 inside ctn-1:
$ nsenter-ctn ctn-1 -n ip -d link show dev eth0 258: eth0@if259: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
It says,* eth0 is one end of a veth pair device, with interface ID 258; note that this index is unique within the node (instead of the container)
- the other end of the veth pair has an interface index 259
Find the device with index 259:
$ ip link | grep 259 259: veth159e062@if258: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
Then show the device details:
$ ip -d link show veth159e062 259: veth159e062@if258: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 ...
veth bridge_slave state forwarding priority 32 ...* bridge_slave means this device is attached to a bridge
- master docker0 further gives us the bridge name: docker0
At this point, we know that ctn-1 connects to docker0 with a path eth0 -> veth159e062 -> docker0.
Repeat the same procedure for ctn-2, we will just get the picture in the above.
Port mirroring
BISDN Linux supports the configuration of mirror ports. Mirrored ports replicate incoming/outgoing traffic, according to their configuration flag. The supported port mirroring flags are the following:* OFDPA_MIRROR_PORT_TYPE_INGRESS: Only ingress traffic is mirrored
- OFDPA_MIRROR_PORT_TYPE_EGRESS: Only egress traffic is mirrored
- OFDPA_MIRROR_PORT_TYPE_INGRESS_EGRESS: Both ingress and egress traffic are mirrored
Please note that when mirroring ports with different maximum link speeds (e.g. a 10G port mirrored to a 1G port), the highest common link speed (1G for the aforementioned example) will be used for both ports.
The following example shows how to mirror ingress traffic from port 2 to port 8 in a switch, as shown in this figure:
Adding mirror ports
Add port 8 as a mirror port:
grpc_cli call <IP>:50051 ofdpaMirrorPortCreate "port_num: 8"
Where <IP> is the IP of the whitebox switch (localhost when logged in locally to the switch).
Then, set port 2 as the mirror source and configure the port type to only mirror ingress traffic:
grpc_cli call <IP>:50051 ofdpaMirrorSourcePortAdd "mirror_dst_port_num: { port_num: 8 }, mirror_src_port_num: { port_num: 2 }, config: { ofdpa_mirror_port_type: OFDPA_MIRROR_PORT_TYPE_INGRESS}"
Verifying mirror port configuration
See the mirror port configuration by running the following command on the whitebox switch:
client_mirror_port_dump
Mirrored traffic cannot be captured on the switch mirror ports. Hence, to verify that traffic is being mirrored, we need to capture traffic on the server port that is connected to the mirror switch port. Within the example from the figure, the following command should be executed on server2:
sudo tcpdump -i eno8
Deleting mirror ports
The port mirror configuration can be deleted with the following commands:
grpc_cli call <IP>:50051 ofdpaMirrorSourcePortDelete "mirror_dst_port_num: { port_num: 8 }, mirror_src_port_num: { port_num: 2 }" grpc_cli call <IP>:50051 ofdpaMirrorPortDelete "port_num: 8"
Port mirroring of bonded interfaces
Port mirroring works with physical ports, not logical ports, so to mirror the full traffic of a bonded interface all the individual bond members need to be mirrored. These can be either mirrored 1:1 to additional ports, or all bond members mirrored to one port.
Chapter 14. Port mirroring
Network administrators can use port mirroring to replicate inbound and outbound network traffic being communicated from one network device to another. Administrators use port mirroring to monitor network traffic and collect network data to: * Debug networking issues and tune the network flow
- Inspect and analyze the network traffic to troubleshoot networking problems
- Detect an intrusion
14.1. Mirroring a network interface using nmcli
You can configure port mirroring using NetworkManager. The following procedure mirrors the network traffic from enp1s0 to enp7s0 by adding Traffic Control (tc) rules and filters to the enp1s0 network interface.
Prerequisites* A network interface to mirror the network traffic to.
Procedure# Add a network connection profile that you want to mirror the network traffic from:
# nmcli connection add type ethernet ifname enp1s0 con-name enp1s0 autoconnect no
- Attach a prio qdisc to enp1s0 for the egress (outgoing) traffic with the 10: handle:
# nmcli connection modify enp1s0 +tc.qdisc "root prio handle 10:"
The prio qdisc attached without children allows attaching filters. - Add a qdisc for the ingress traffic, with the ffff: handle:
# nmcli connection modify enp1s0 +tc.qdisc "ingress handle ffff:" - Add the following filters to match packets on the ingress and egress qdiscs, and to mirror them to enp7s0:
# nmcli connection modify enp1s0 +tc.tfilter "parent ffff: matchall action mirred egress mirror dev enp7s0" - nmcli connection modify enp1s0 +tc.tfilter "parent 10: matchall action mirred egress mirror dev enp7s0"
The matchall filter matches all packets, and the mirred action redirects packets to destination. - Activate the connection:
# nmcli connection up enp1s0
Verification steps# Install the tcpdump utility:
# yum install tcpdump
- Display the traffic mirrored on the target device (enp7s0):
# tcpdump -i enp7s0
Additional resources* How to capture network packets using tcpdump