mehmedbasic.dk

Running Docker containers on a separate MAC address on your network

This will be a tutorial on how to direct traffic from a Docker (podman will probably work too with some modifications) container through a virtual network card on the same host, but using a separate MAC/IP address than the hosts default. We will be taking a look at Linux network bridges, iptables and Docker networks.

Note: The solution I come up with relies on something called a macvlan bridge. Due to how WIFI authentication works this solution cannot be used over WIFI. For that you will probably need to use VLANs.

Disclamer: You should conduct your own research before using this as a means of securely isolating containers on a production network.

Problem background

Once upon a time I bought a used pfSense box and the amount of time I spent doing network administration went from close to zero to “VLAN all the things!”. Around the same time I built a small server based on an old Intel Atom running Debian. The server slowly evolved into a small homelab running quite a lot self-hosted stuff. Some of the services were for my consumption (LAN only), some were accessible over the web and some were bridged to other peoples LANs using OpenVPN and a series of iptables rules.

A year or two later I built a big hypervisor box that ran my old server in a VM. I then separated the services destined for the LAN and web to a separate VM and kept all the VPNed stuff in it’s own virtual machine. I replaced the complexity that is iptables with a simple-but-not-really VPN setup in pfSense. I now had two virtual machines kinda separated by concern running my backups, note-taking apps etc.

Having two VMs running is fine and dandy but they do need to be updated and kept in check. This does not scale well when you have small children, so I decided to pull the services out of the VM and run them in containers on the hypervisor instead. Since the services were already in containers it was an easy move, but what about the firewall NAT rules? How do I assign one IP in the router to a set of containers and another IP to another, distinct set of containers? This could be achieved by adding a VLAN tag to the container network, but that would require a rewrite of all my firewall rules AND introduce yet another VLAN in my already complicated network.

The solution I cobbled together does not require any firewall changes but does require some additional container network configuration. First we need to create a network interface with a different IP running on the same ethernet port. Next we need to force the containers through the new bridge and lastly we need to make sure we can choose which bridge the containers connect through. Let’s get started!

Step 1: Multiple MAC/IP addresses on one ethernet port

Have you ever thought about how a switch knows where to route traffic? While it may seem like magic at first, it’s unfortunately not magic at all. The sender’s MAC address is just 6 bytes written into an ethernet packet. When you send out a packet through a switch, the brainbox in the switch notes your MAC address and the physical port you are connected to. Since you can have more than one MAC on one physical port, you can likewise also have multiple MACs on one network interface. If you write a different sender MAC in an ethernet packet and send it out, the switch notices this and returns packets destined for that MAC to your network card. You of course still only receive one packet at a time, so all your virtual MAC addresses share the same bandwidth.

To get an extra IP on your NIC you need to create a macvlan (there are other solutions, this one seemed simple). This fictitious network interface gets a different MAC address and you have to set up some iptables rules for it. I am doing this on the Proxmox distribution which is based on Debian. These commands should work on most recent distros and I’ve successfully done it on Debian 10, Ubuntu 20.04 and OpenSUSE Tumbleweed 2022.01.

Creating the MACVLAN

Start by determining which interface you are connected to the network with. In my case it was vmbr0 (technically a bridge interface due to some Proxmox shenanigans), yours could be named eth0 or enp3s0 or something like that.

Then let’s create a macvlan on vmbr0 using the ip command:

 sudo ip link add link vmbr0 mac0 type macvlan mode bridge

If you run sudo ip a you should now see the bridge:

17425: mac0@vmbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000

Now let’s configure the interface to pull an address. I went with a static DHCP lease in my router instead of a static address. For this I needed to have the MAC address be the same across reboots. You set this up in your distro-specific network interface configuration file. If you still have /etc/network/interfaces you can add it directly there:

auto mac0
iface mac0 inet dhcp
        hwaddress ether 01:01:03:04:05:06
        pre-up ip link add mac0 link vmbr0 type macvlan mode bridge
        pre-up ip link set dev mac0 address 01:01:03:04:05:06
        pre-up ip link set dev mac0 up
        post-down ip link del mac0 link vmbr0 type macvlan mode bridge

If you have networkd you have to edit sudo nano /etc/sysconfig/network/ifcfg-mac0 with the following contents:

BOOTPROTO='dhcp'
STARTMODE='auto'
MACADDR=01:01:03:04:05:06
MACVLAN_DEVICE='vmbr0'
MACVLAN_MODE='bridge'
LLADDR=01:01:03:04:05:06

Restart your network to pull the new bridge up. If it works you will see it get an IP address. If it doesn’t work consult your local network masochist.

Testing the bridge

Now that you have two IPs you need to test that there’s a connection on both. For this I used iperf3 (every distro has it in the repos). Let’s assume your new IP is 192.168.0.42. On the server run iperf3 in server mode bound to this ip:

iperf3 --bind 192.168.0.42 -s

On another machine run iperf3 in client mode:

iperf3 -c 192.168.0.42

It should look something like the following:

Connecting to host 192.168.0.42, port 5201
[  5] local 192.168.0.177 port 45520 connected to 192.168.0.42 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  17.0 MBytes   142 Mbits/sec   36    423 KBytes       
[  5]   1.00-2.00   sec  15.3 MBytes   128 Mbits/sec    0    488 KBytes       

Congratulations, you now have two IPs on a single machine. Next up let’s take a short dive in to some Docker networking specifics.

Step 2: How container networking works

The Linux kernel has a concept of namespaces for processors, memory, network et.al. Docker is a frontend that manages the creation and destruction of these namespaces. When you run a process in a container, be it Docker or LXC, the process runs on the host’s kernel but within a restricted namespace that acts as a sandbox. The kernel manages a container as an ordinary native process albeit one in a namespace.

All this means you can run ps aux and see your container processes running right there. From within a Docker container you can ping other containers on the same Docker network. But how do the containers connect to the internet? Good old fashioned NAT. Docker creates a network bridge and adds some firewall rules to route the packets through the host network. It also adds a DNS layer letting you connect to containers using their container name.

Side note: Servers often have more than one network card and as one might suspect there are many ways of routing network traffic on a Linux system. I will only be touching upon a miniscule part of the toolkit.

Now how do we route container traffic through the new bridge? We have established that the default configuration of Docker creates some form of NAT through a Linux bridge. The Docker docs list a number of different network types and how to use them. None of the network types really fit the bill, VLANs do come close though, but it requires config changes in my router. Let’s dive a little deeper.

When you create a network with Docker it creates a bridge. Let’s look at what a bridge does in docker:

docker network create testnet

This gives us a network bridge called something seemingly random. In my case br-dc3a4efc5a24, and ip a show us this:

50: br-dc3a4efc5a24: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:b0:ea:3a:89 brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.1/16 brd 172.22.255.255 scope global br-dc3a4efc5a24
       valid_lft forever preferred_lft forever

It looks like gibberish if you don’t have a PhD in faffing around with networking. A couple of things do make sense though: link/ether and inet. For our bridge we have a random MAC address 02:42:b0:ea:3a:89 and an ip range: 172.22.0.1/16.

Now traffic on this new bridge will not go anywhere unless there’s an iptables rule for it. Let’s have a look:

iptables -t nat -L

This outputs all the rules for NAT, and looking under the POSTROUTING chain we find this little line:

MASQUERADE  all  --  172.22.0.0/16        anywhere            

Unfortunately the above does not involve parties or masks but instead it indicates that traffic from 172.22.0.0/16 is masqueraded as traffic from the host. In other words good old fashioned NAT. But this is not what we want, no we want bad new fashioned NAT. We need to figure out what iptables rules docker create network creates. This is probably documented somewhere in the docs or their source code which I am too lazy to look up, so we’ll just use iptables-save to get a dump of all the rules (line numbers added by me):

$ sudo iptables-save | grep br-dc3
-A FORWARD -o br-dc3a4efc5a24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-dc3a4efc5a24 -j DOCKER
-A FORWARD -i br-dc3a4efc5a24 ! -o br-dc3a4efc5a24 -j ACCEPT
-A FORWARD -i br-dc3a4efc5a24 -o br-dc3a4efc5a24 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-dc3a4efc5a24 ! -o br-dc3a4efc5a24 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-2 -o br-dc3a4efc5a24 -j DROP
-A POSTROUTING -s 172.22.0.0/16 ! -o br-dc3a4efc5a24 -j MASQUERADE
-A DOCKER -i br-dc3a4efc5a24 -j RETURN

As you can see it does quite a lot. Lines 2,3,4,5 basically say “allow traffic to and from this bridge”. Lines 6-7 isolate the network. Line 9 drops all connections that aren’t picked up by the previous rules. Line 8 is the one we were looking for:

-A POSTROUTING -s 172.22.0.0/16 ! -o br-dc3a4efc5a24 -j MASQUERADE

This should be read like so:

  • -A POSTROUTING: append the rule to the POSTROUTING chain
  • -s 172.22.0.0/16: match packets coming from the subnet 172.22.0.0/16
  • ! -o br-dc3a4efc5a24: match packets that have any destination but br-dc3a4efc5a24 (the exclamation mark means not).
  • -j MASQUERADE: if the previous rules match, jump to the MASQUERADE target

As any network engineer will tell you the MASQUERADE target simply routes the packet using the default routing table, which we don’t want to mess with. This means that we have to read some manpages.

Step 3: Brining it together

In the manpage for iptables-extensions we find a solution to our problem: SNAT. A quote from the page: “It specifies that the source address of the packet should be modified […]”. Continuing we find this description:

--to-source ipaddr[-ipaddr][:port-port]
    which can specify a single new source IP address, an inclusive range of IP
    addresses, and optionally, a port range [...]

This target seems to be the answer to our digital prayers. We can now set an interface to have a specific source IP when communicating with the rest of the world by invoking the command:

sudo iptables -t nat -A POSTROUTING   \
     -s 172.22.0.0/16 ! -o my-bridge \
     -j SNAT --to-source <IP of interface>

Now we just need to tell docker to stop creating MASQUERADE rules. Searching the Docker docs for docker network create we find the following option:

com.docker.network.bridge.enable_ip_masquerade 	--ip-masq 	Enable IP masquerading

Which means we can disable IP masquerading. On the same page you can find an option to set the bridge name along with the subnet of the bridge. This will give us predictable names and IP addresses. Let’s give it a try:

docker network create --subnet=172.22.0.0/16 \
       -o "com.docker.network.bridge.name=my-bridge" \
       -o "com.docker.network.bridge.enable_ip_masquerade=false" \
       testnet

The above command creates a docker network with the subnet 172.22.0.0/16 on a linux bridge named my-bridge with masquerade disabled. Let’s see if it works. To properly test it we need to run a server of some sort on a machine that is not our server. Now run iperf3 -s on said machine, the output should say something like Server listening on 5201. The machine’s IP

Now the fun part. From the server run:

docker run -it --rm alpine /bin/sh -c "apk add iperf3 && iperf3 -c 192.168.0.177"

On your machine you should see a connection:

Accepted connection from <YOUR SERVER IP>

Now for the fun part. Try the same command but in the new docker network:

docker run -it  --net testnet --rm alpine /bin/sh -c "apk add iperf3 && iperf3 -c 192.168.0.177"

Now the connection should output:

Accepted connection from 192.168.0.42

If that’s the case, then congratulations! You can now route traffic from any container through a specific MAC/IP address on your machine by simply specifying --net testnet. If that’s not worth a week of research I don’t know what is.

Thanks for reading if you made it so far.

/J

© 2024 Jesenko Mehmedbasic - this is a footer.