This will be a tutorial on how to direct traffic from a Docker (podman will probably work too with some modifications) container through a virtual network card on the same host, but using a separate MAC/IP address than the hosts default. We will be taking a look at Linux network bridges, iptables and Docker networks.
Note: The solution I come up with relies on something called a macvlan bridge. Due to how WIFI authentication works this solution cannot be used over WIFI. For that you will probably need to use VLANs.
Disclamer: You should conduct your own research before using this as a means of securely isolating containers on a production network.
Problem background
Once upon a time I bought a used pfSense box and the amount of time I spent doing network administration went from close to zero to “VLAN all the things!”. Around the same time I built a small server based on an old Intel Atom running Debian. The server slowly evolved into a small homelab running quite a lot self-hosted stuff. Some of the services were for my consumption (LAN only), some were accessible over the web and some were bridged to other peoples LANs using OpenVPN and a series of iptables rules.
A year or two later I built a big hypervisor box that ran my old server in a VM. I then separated the services destined for the LAN and web to a separate VM and kept all the VPNed stuff in it’s own virtual machine. I replaced the complexity that is iptables with a simple-but-not-really VPN setup in pfSense. I now had two virtual machines kinda separated by concern running my backups, note-taking apps etc.
Having two VMs running is fine and dandy but they do need to be updated and kept in check. This does not scale well when you have small children, so I decided to pull the services out of the VM and run them in containers on the hypervisor instead. Since the services were already in containers it was an easy move, but what about the firewall NAT rules? How do I assign one IP in the router to a set of containers and another IP to another, distinct set of containers? This could be achieved by adding a VLAN tag to the container network, but that would require a rewrite of all my firewall rules AND introduce yet another VLAN in my already complicated network.
The solution I cobbled together does not require any firewall changes but does require some additional container network configuration. First we need to create a network interface with a different IP running on the same ethernet port. Next we need to force the containers through the new bridge and lastly we need to make sure we can choose which bridge the containers connect through. Let’s get started!
Step 1: Multiple MAC/IP addresses on one ethernet port
Have you ever thought about how a switch knows where to route traffic? While it may seem like magic at first, it’s unfortunately not magic at all. The sender’s MAC address is just 6 bytes written into an ethernet packet. When you send out a packet through a switch, the brainbox in the switch notes your MAC address and the physical port you are connected to. Since you can have more than one MAC on one physical port, you can likewise also have multiple MACs on one network interface. If you write a different sender MAC in an ethernet packet and send it out, the switch notices this and returns packets destined for that MAC to your network card. You of course still only receive one packet at a time, so all your virtual MAC addresses share the same bandwidth.
To get an extra IP on your NIC you need to create a macvlan (there are other solutions, this one seemed simple). This fictitious network interface gets a different MAC address and you have to set up some iptables rules for it. I am doing this on the Proxmox distribution which is based on Debian. These commands should work on most recent distros and I’ve successfully done it on Debian 10, Ubuntu 20.04 and OpenSUSE Tumbleweed 2022.01.
Creating the MACVLAN
Start by determining which interface you are connected to the network with. In my case it was vmbr0
(technically a bridge interface due to some Proxmox shenanigans), yours could be named eth0
or enp3s0
or something like that.
Then let’s create a macvlan on vmbr0
using the ip
command:
sudo ip link add link vmbr0 mac0 type macvlan mode bridge
If you run sudo ip a
you should now see the bridge:
17425: mac0@vmbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
Now let’s configure the interface to pull an address. I went with a static DHCP lease in my router instead of a static address. For this I needed to have the MAC address be the same across reboots. You set this up in your distro-specific network interface configuration file. If you still have /etc/network/interfaces
you can add it directly there:
auto mac0
iface mac0 inet dhcp
hwaddress ether 01:01:03:04:05:06
pre-up ip link add mac0 link vmbr0 type macvlan mode bridge
pre-up ip link set dev mac0 address 01:01:03:04:05:06
pre-up ip link set dev mac0 up
post-down ip link del mac0 link vmbr0 type macvlan mode bridge
If you have networkd
you have to edit sudo nano /etc/sysconfig/network/ifcfg-mac0
with the following contents:
BOOTPROTO='dhcp'
STARTMODE='auto'
MACADDR=01:01:03:04:05:06
MACVLAN_DEVICE='vmbr0'
MACVLAN_MODE='bridge'
LLADDR=01:01:03:04:05:06
Restart your network to pull the new bridge up. If it works you will see it get an IP address. If it doesn’t work consult your local network masochist.
Testing the bridge
Now that you have two IPs you need to test that there’s a connection on both. For this I used iperf3
(every distro has it in the repos). Let’s assume your new IP is 192.168.0.42
.
On the server run iperf3
in server mode bound to this ip:
iperf3 --bind 192.168.0.42 -s
On another machine run iperf3
in client mode:
iperf3 -c 192.168.0.42
It should look something like the following:
Connecting to host 192.168.0.42, port 5201
[ 5] local 192.168.0.177 port 45520 connected to 192.168.0.42 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 17.0 MBytes 142 Mbits/sec 36 423 KBytes
[ 5] 1.00-2.00 sec 15.3 MBytes 128 Mbits/sec 0 488 KBytes
Congratulations, you now have two IPs on a single machine. Next up let’s take a short dive in to some Docker networking specifics.
Step 2: How container networking works
The Linux kernel has a concept of namespaces for processors, memory, network et.al. Docker is a frontend that manages the creation and destruction of these namespaces. When you run a process in a container, be it Docker or LXC, the process runs on the host’s kernel but within a restricted namespace that acts as a sandbox. The kernel manages a container as an ordinary native process albeit one in a namespace.
All this means you can run ps aux
and see your container processes running right there. From within a Docker container you can ping other containers on the same Docker network. But how do the containers connect to the internet? Good old fashioned NAT. Docker creates a network bridge and adds some firewall rules to route the packets through the host network. It also adds a DNS layer letting you connect to containers using their container name.
Side note: Servers often have more than one network card and as one might suspect there are many ways of routing network traffic on a Linux system. I will only be touching upon a miniscule part of the toolkit.
Now how do we route container traffic through the new bridge? We have established that the default configuration of Docker creates some form of NAT through a Linux bridge. The Docker docs list a number of different network types and how to use them. None of the network types really fit the bill, VLANs do come close though, but it requires config changes in my router. Let’s dive a little deeper.
When you create a network with Docker it creates a bridge. Let’s look at what a bridge does in docker:
docker network create testnet
This gives us a network bridge called something seemingly random. In my case br-dc3a4efc5a24
, and ip a
show us this:
50: br-dc3a4efc5a24: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:b0:ea:3a:89 brd ff:ff:ff:ff:ff:ff
inet 172.22.0.1/16 brd 172.22.255.255 scope global br-dc3a4efc5a24
valid_lft forever preferred_lft forever
It looks like gibberish if you don’t have a PhD in faffing around with networking. A couple of things do make sense though: link/ether
and inet
. For our bridge we have a random MAC address 02:42:b0:ea:3a:89
and an ip range: 172.22.0.1/16
.
Now traffic on this new bridge will not go anywhere unless there’s an iptables
rule for it. Let’s have a look:
iptables -t nat -L
This outputs all the rules for NAT, and looking under the POSTROUTING
chain we find this little line:
MASQUERADE all -- 172.22.0.0/16 anywhere
Unfortunately the above does not involve parties or masks but instead it indicates that traffic from 172.22.0.0/16
is masqueraded as traffic from the host. In other words good old fashioned NAT. But this is not what we want, no we want bad new fashioned NAT. We need to figure out what iptables rules docker create network
creates. This is probably documented somewhere in the docs or their source code which I am too lazy to look up, so we’ll just use iptables-save
to get a dump of all the rules (line numbers added by me):
$ sudo iptables-save | grep br-dc3
-A FORWARD -o br-dc3a4efc5a24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-dc3a4efc5a24 -j DOCKER
-A FORWARD -i br-dc3a4efc5a24 ! -o br-dc3a4efc5a24 -j ACCEPT
-A FORWARD -i br-dc3a4efc5a24 -o br-dc3a4efc5a24 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-dc3a4efc5a24 ! -o br-dc3a4efc5a24 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-2 -o br-dc3a4efc5a24 -j DROP
-A POSTROUTING -s 172.22.0.0/16 ! -o br-dc3a4efc5a24 -j MASQUERADE
-A DOCKER -i br-dc3a4efc5a24 -j RETURN
As you can see it does quite a lot. Lines 2,3,4,5 basically say “allow traffic to and from this bridge”. Lines 6-7 isolate the network. Line 9 drops all connections that aren’t picked up by the previous rules. Line 8 is the one we were looking for:
-A POSTROUTING -s 172.22.0.0/16 ! -o br-dc3a4efc5a24 -j MASQUERADE
This should be read like so:
-A POSTROUTING
: append the rule to the POSTROUTING chain-s 172.22.0.0/16
: match packets coming from the subnet172.22.0.0/16
! -o br-dc3a4efc5a24
: match packets that have any destination butbr-dc3a4efc5a24
(the exclamation mark means not).-j MASQUERADE
: if the previous rules match, jump to theMASQUERADE
target
As any network engineer will tell you the MASQUERADE
target simply routes the packet using the default routing table, which we don’t want to mess with. This means that we have to read some manpages.
Step 3: Brining it together
In the manpage for iptables-extensions
we find a solution to our problem: SNAT
. A quote from the page: “It specifies that the source address of the packet should be modified […]”. Continuing we find this description:
--to-source ipaddr[-ipaddr][:port-port]
which can specify a single new source IP address, an inclusive range of IP
addresses, and optionally, a port range [...]
This target seems to be the answer to our digital prayers. We can now set an interface to have a specific source IP when communicating with the rest of the world by invoking the command:
sudo iptables -t nat -A POSTROUTING \
-s 172.22.0.0/16 ! -o my-bridge \
-j SNAT --to-source <IP of interface>
Now we just need to tell docker to stop creating MASQUERADE
rules. Searching the Docker docs for docker network create
we find the following option:
com.docker.network.bridge.enable_ip_masquerade --ip-masq Enable IP masquerading
Which means we can disable IP masquerading. On the same page you can find an option to set the bridge name along with the subnet of the bridge. This will give us predictable names and IP addresses. Let’s give it a try:
docker network create --subnet=172.22.0.0/16 \
-o "com.docker.network.bridge.name=my-bridge" \
-o "com.docker.network.bridge.enable_ip_masquerade=false" \
testnet
The above command creates a docker network with the subnet 172.22.0.0/16
on a linux bridge named my-bridge
with masquerade disabled. Let’s see if it works.
To properly test it we need to run a server of some sort on a machine that is not our server. Now run iperf3 -s
on said machine, the output should say something like Server listening on 5201
. The machine’s IP
Now the fun part. From the server run:
docker run -it --rm alpine /bin/sh -c "apk add iperf3 && iperf3 -c 192.168.0.177"
On your machine you should see a connection:
Accepted connection from <YOUR SERVER IP>
Now for the fun part. Try the same command but in the new docker network:
docker run -it --net testnet --rm alpine /bin/sh -c "apk add iperf3 && iperf3 -c 192.168.0.177"
Now the connection should output:
Accepted connection from 192.168.0.42
If that’s the case, then congratulations!
You can now route traffic from any container through a specific MAC/IP address on your machine by simply specifying --net testnet
. If that’s not worth a week of research I don’t know what is.
Thanks for reading if you made it so far.
/J