The 15-Year-Old iptables Rule That Broke My DNS
This post is part of the Agentic Adventures series
- Jekyll to Hugo Migration
- Claude Fixes User Bug
- Agents Day Lisbon
- Letting Claude Upgrade My Raspberry Pi
- The 15-Year-Old iptables Rule That Broke My DNS (this post)
One of my servers has a weird problem after every reboot: it can ping IP addresses just fine, but it can’t resolve any DNS names.
$ ping 8.8.8.8 # works
$ ping google.com # ping: google.com: Temporary failure in name resolution
I’ve been working around this for a while now: after every reboot I’d SSH in and overwrite /etc/resolv.conf to point straight at 8.8.8.8 instead of the local 127.0.0.53 stub. That got DNS working again, but it was never a real fix. /etc/resolv.conf is regenerated on boot, so my edit vanished the next time the machine came up and I was back to fixing it by hand. This time I decided to attempt to properly debug it with Claude.
The debugging session
Claude’s first instinct was to split the problem in half: is DNS blocked on the network, or is the local resolver misconfigured?
$ dig google.com @8.8.8.8 +short
142.251.38.142
Bypassing the local resolver worked, so the network was fine. The problem had to be local. /etc/resolv.conf pointed at 127.0.0.53, the systemd-resolved stub, so we asked it directly:
$ dig google.com @127.0.0.53 +short
;; communications error to 127.0.0.53#53: timed out
systemd-resolved was running, had upstream servers configured, and reported Status: "Processing requests...". Everything looked healthy. Except for this line repeating in its logs:
systemd-resolved[985]: Got packet on unexpected (i.e. non-localhost) IP range, ignoring.
This was the smoking gun (as Claude likes to say!). systemd-resolved only accepts queries that arrive from a localhost source address. My queries were arriving at 127.0.0.53 with a non-localhost source address, so it silently dropped every single one.
What could possibly rewrite the source address of a packet that never leaves the machine? NAT.
$ sudo iptables -t nat -L POSTROUTING -n -v
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
231 44186 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0
A MASQUERADE rule with no interface restriction. It was source-NAT-ing everything, including loopback traffic to the stub resolver. The packet counter was climbing with every failed DNS lookup.
The archaeology
But why did this come back after every reboot? Nothing in /etc/ufw, /etc/iptables, cron, or any VPN unit mentioned MASQUERADE. A wider grep finally found it:
$ grep -rn "iptables.sav" /etc 2>/dev/null
/etc/iptables.sav:1:# Generated by iptables-save v1.4.4 on Sat Dec 4 11:52:34 2010
/etc/rc.local:13:iptables-restore < /etc/iptables.sav
/etc/rc.local has been restoring a ruleset saved in December 2010 on every boot. For more than fifteen years!
I can’t be certain, but my best guess is that this dates back to how I networked VirtualBox at the time. I’ve run VMs on this box for as long as I can remember, and back in 2010 the standard way to give them outbound access was to put them on a host-only or internal network and masquerade their traffic out through the host’s real interface. A MASQUERADE rule on POSTROUTING is exactly the recipe you’d copy off a wiki page to make that work. It got saved with iptables-save, wired into rc.local, and has been running on every boot since.
Why it only broke now
The best part: this rule predates systemd-resolved entirely. Back in 2010, the glibc resolver didn’t care what source address its loopback queries had, so nothing was visibly broken. Ubuntu only switched the default resolver to systemd-resolved with the 127.0.0.53 stub listener in 18.04 LTS (it first shipped, not enabled by default, back around 16.10).
Somewhere in the chain of dist upgrades this box crossed that line and picked up a resolver that does care about the source address, and DNS broke on every boot from then on. Yes, this was broken for a while, but the machine reboots only a few times per year so it wasn’t bothering me too much 🙈
Delete it or fix it?
I do still run a couple of VirtualBox VMs on this box, but they’re on NAT mode now, where VirtualBox does its own address translation in userspace and forwards ports itself. The host’s iptables rule isn’t in their path, so as far as I can tell the rule is doing nothing useful today. It may have mattered for some older host-only setup, but that’s long gone.
I could have just deleted the rule, and it probably would have been fine. But on a fifteen-year-old box I no longer fully understand, I went with the smaller change: adding -o eth0 back fixes DNS while leaving anything that might still depend on masquerading out the physical interface exactly as it was.
So: one edit to line 6 of /etc/iptables.sav (adding -o eth0), one iptables-restore dress rehearsal to confirm the boot path now loads the fixed rule, and the bug that survived fifteen years of upgrades was finally dead.
Fin
Claude is a surprisingly good system administrator. I never gave it access to the machine we were debugging. It supplied the commands, I ran them by hand and pasted back the output, and it still tracked the problem down to a rule from 2010. I was genuinely impressed.