systemd / DBus debugging starter pack

Introduction

Trying to solve a 90 second wait-on-some-start-job on each boot situation, I found that there’s little info on how to tackle a problem like this out there. Most web pages usually go “I did this, hurray it worked!”, but where do you start solving a problem like this if none of the do-this-do-that advice helps?

So this is a random pile of things to try out. Most of the things shown below didn’t solve my own issue, but these are the tools I collected in my toolbox.

I did this on a Debian 8 (Jessie), and the problem was that boot was stuck for a minute and a half on “A start job is running for sys-subsystem-net-devices-eth0.device”. It’s not directly relevant, except that it influences what I tried out, and hence listed below.

I have written a shorter version of this post which focuses on the specific problem. This post is more about the techniques for figuring out what’s going on.

PID 1 at your service

The tricky part of systemd is that much of the activity is done directly by the systemd process, having PID 1. Requests to start and stop services and other units are sent via DBus messages, i.e. over connections to UNIX sockets. To someone who is used to the good-old-systemV Linux, this is voodoo at its worst, but there are simple ways to keep track of this, as shown below.

In particular, don’t strace the “systemctl start” process — it just sends the request over DBus. Rather, attach strace to PID 1, also explained below. That’s where the fork to the actual job process takes place, if at all.

And don’t get confused by having /org/freedesktop/ appearing everywhere in the logs. It doesn’t necessarily have anything to do with the desktop (if such exists), and is likewise relevant to a non-graphical system. DBus’ started as a solution for desktop machines, and that’s the only reason “freedesktop” is everywhere.

First thing first

Read the man page, “man systemd.device” in my case. If there’s another computer with different configuration, see what happens there. What does it look like when it works?

journald -x

As mentioned on this page, if something went wrong during boot, check out the log to see why. The -x flag adds valuable info for solving issues of this sort.

For example,

# journald -x

[ ... ]

May 20 11:41:20 diskless systemd[1]: Job sys-subsystem-net-devices-eth0.device/start timed out.
May 20 11:41:20 diskless systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.
-- Subject: Unit sys-subsystem-net-devices-eth0.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sys-subsystem-net-devices-eth0.device has failed.
--
-- The result is timeout.
May 20 11:41:20 diskless systemd[1]: Dependency failed for ifup for eth0.
-- Subject: Unit ifup@eth0.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ifup@eth0.service has failed.
--
-- The result is dependency.

Now, how to read it: It’s clear that sys-subsystem-net-devices-eth0.device is the unit that didn’t manage to kick off. But the more important clue is that ifup@eth0.service failed, because it depends on the former.

It’s important, because it explains why an attempt to launch sys-subsystem-net-devices-eth0.device was done in the first place. A lot of “there I fixed it” pages on the web disable the latter service, and get rid of the problem, not necessarily understanding how and why.

So here’s why: On some relatively early systemd versions, *.device units simply won’t launch. It’s worked around by making sure that no other unit requests them. But then some software package isn’t aware of this, and requests a .device unit and there’s the deadlock. Or more precisely, waiting 90s for the timeout.

Kicking if off manually

Obviously, the unit is inactive:

# systemctl status sys-subsystem-net-devices-eth0.device
● sys-subsystem-net-devices-eth0.device
   Loaded: loaded
   Active: inactive (dead)

May 20 12:47:07 diskless systemd[1]: Expecting device sys-subsystem-net-devices-eth0.device...
May 20 12:48:37 diskless systemd[1]: Job sys-subsystem-net-devices-eth0.device/start timed out.
May 20 12:48:37 diskless systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.

So try to start it manually (this little session took 90 seconds, right?)

# systemctl start sys-subsystem-net-devices-eth0.device
Job for sys-subsystem-net-devices-eth0.device timed out.

The important takeaway is that we can repeat the problem on a running system (as opposed to a booting one). This allows running some tools for looking at what happens.

On the other hand, I’m not all that sure .device units are supposed to be started or stopped with systemctl at all. Or more likely, that requesting a start or stop on device units means waiting for them to reach the desired state by themselves. This goes along with the observation I made with strace (below), showing that systemd does nothing meaningful until it times out. So most likely, it just looked up the state of the device unit, saw it wasn’t started, and then went to sleep, essentially waiting for a udev event to bring the unit to the desired state, and consequently return a success status to the start request.

In fact, when I tried “systemctl stop” on the eth0 device on another machine, on which it the device file was activated automatically, it got stuck exactly the same way as for starting it on Debian 8.

As far as I understand, these should become active and inactive by a systemd-udev event by virtue of udev labeling. They are there to trigger other units that depend on them, not to be controlled explicitly.

But here comes a major red herring: Curiously enough, during the 90 seconds of waiting, “systemctl starts” created a child process, “/bin/systemd-tty-ask-password-agent –watch”. One can easily be misled into thinking that it’s this child process that blocks the completion of the former command.

So first, let’s convince ourselves that it’s not the problem, because running

# systemctl --no-ask-password start sys-subsystem-net-devices-eth0.device

doesn’t create this second process, but is stuck nevertheless.

This systemd-tty-ask-password-agent process listens for system-wide requests for obtaining a password from the user (e.g. when opening a crypto disk), and does that job if necessary. systemctl launches it just in case, regardless of the unit requested for starting. This is the way to make sure passwords are collected, if so needed. This process is usually not visible, because systemctl commands typically don’t last very long. More about it here.

Actually, checking with strace, systemctl was blocking all those 90 seconds on a ppoll(), waiting for some response from the /run/systemd/private UNIX socket. That’s the DBus connection with process number 1, systemd. In other words, systemctl requested the start of the unit over DBus, and then waited for the result for 90 seconds, at which point it got the answer that the attempt timed out.

Listening to DBus

There’s are two utilities, dbus-monitor and “busctl monitor” for dumping DBus messages (an eavesdrop add-on may be required to allow system-wide message monitoring, but this was not the case on my system).

So on the invocation of

# systemctl start sys-subsystem-net-devices-eth0.device

the output of

# dbus-monitor --system

was

signal sender=org.freedesktop.DBus -> dest=:1.8 serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired
   string ":1.8"
signal sender=:1.0 -> dest=(null destination) serial=127 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitNew
   string "sys-subsystem-net-devices-eth0.device"
   object path "/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2deth0_2edevice"
signal sender=:1.0 -> dest=(null destination) serial=128 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=JobNew
   uint32 173
   object path "/org/freedesktop/systemd1/job/173"
   string "sys-subsystem-net-devices-eth0.device"
signal sender=:1.0 -> dest=(null destination) serial=129 path=/org/freedesktop/systemd1/job/173; interface=org.freedesktop.DBus.Properties; member=PropertiesChanged
   string "org.freedesktop.systemd1.Job"
   array [
      dict entry(
         string "State"
         variant             string "running"
      )
   ]
   array [
   ]

and when the timeout occurs with a

Job for sys-subsystem-net-devices-eth0.device timed out.

the following output is captured on the DBus:

signal sender=:1.0 -> dest=(null destination) serial=141 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=JobRemoved
   uint32 173
   object path "/org/freedesktop/systemd1/job/173"
   string "sys-subsystem-net-devices-eth0.device"
   string "timeout"
signal sender=:1.0 -> dest=(null destination) serial=142 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitRemoved
   string "sys-subsystem-net-devices-eth0.device"
   object path "/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2deth0_2edevice"

Clearly, everything was done by the systemd main process, and almost nothing by the process created from console.

The “sender=:1.0″ part means that the sender is the process number 1 (systemd). Try

$ busctl

(that’s short for “busctl list”) to get a mapping between these addresses and processes.

See the number 173 in the object paths all over in the dbus traffic? That’s the job number as listed in

# systemctl list-jobs
JOB UNIT                                  TYPE  STATE
173 sys-subsystem-net-devices-eth0.device start running

1 jobs listed.

Note that these job numbers have absolutely nothing to do with the Linux PIDs.

Using strace

strace is often very useful for resolving OS problems. It’s however important to realize that the old-fashioned way of stracing the process created on command line will probably not yield much information, because this process only sends a request over DBus.

Instead, strace the process that does the actual work: PID 1, the Mother Of All Processes, the almighty systemd itself. I have to admit that I was first intimidated by the idea to attach strace to this process, but it turns out that it’s usually quite calm, and spits out relatively little unrelated mumbo-jumbo.

Bonus: It’s always the same command:

# strace -p 1 -s 128 -ff -o systemd-trace

This makes a file for each process systemd may fork into. If things went wrong because some process didn’t execute properly, this is how we catch it.

For example, when running the said “systemctl start sys-subsystem-net-devices-eth0.device” command, this was the output:

accept4(12, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 13
getsockopt(13, SOL_SOCKET, SO_PEERCRED, {pid=901, uid=0, gid=0}, [12]) = 0
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 18
read(18, "\270\305\231\206+&\262MI\313[\337y}\314V", 16) = 16
close(18)                               = 0
fcntl(13, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(13, F_GETFD)                      = 0x1 (flags FD_CLOEXEC)
fstat(13, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
setsockopt(13, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
setsockopt(13, SOL_SOCKET, 0x22 /* SO_??? */, [0], 4) = 0
getsockopt(13, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
setsockopt(13, SOL_SOCKET, 0x21 /* SO_??? */, [8388608], 4) = 0
getsockopt(13, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
setsockopt(13, SOL_SOCKET, 0x20 /* SO_??? */, [8388608], 4) = 0
getsockopt(13, SOL_SOCKET, SO_PEERCRED, {pid=901, uid=0, gid=0}, [12]) = 0
fstat(13, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
getsockopt(13, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
getsockname(13, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) = 0
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45
epoll_ctl(4, EPOLL_CTL_ADD, 13, {0, {u32=2784593808, u64=94857137325968}}) = 0
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 18
read(18, "\10,\360z\t\363+\355D\2556NLkhL", 16) = 16
close(18)                               = 0
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 18
read(18, "$F\3302\215\326\320\251\261\240\217\232\224\1\346\205", 16) = 16
close(18)                               = 0
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 18
read(18, "|\313E\273R\264 \375\v\245\235\206h\247\30-", 16) = 16
close(18)                               = 0
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 18
read(18, "{\200\255\356\26\341b4V_P\225aHkO", 16) = 16
close(18)                               = 0
epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLOUT, {u32=2784593808, u64=94857137325968}}) = 0
timerfd_settime(29, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={2519, 883043000}}, NULL) = 0
epoll_wait(4, {{EPOLLOUT, {u32=2784593808, u64=94857137325968}}}, 33, 0) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 359675827}) = 0
timerfd_settime(29, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={2508, 883043000}}, NULL) = 0
epoll_wait(4, {{EPOLLOUT, {u32=2784593808, u64=94857137325968}}}, 33, 0) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 359719924}) = 0
sendmsg(13, {msg_name(0)=NULL, msg_iov(3)=[{"OK b8c599862b26424d89cb5bdf797dcc56\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52
epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN, {u32=2784593808, u64=94857137325968}}) = 0
epoll_wait(4, {{EPOLLIN, {u32=2784593808, u64=94857137325968}}}, 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 359793419}) = 0
epoll_wait(4, {{EPOLLIN, {u32=2784593808, u64=94857137325968}}}, 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 359846496}) = 0
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"l\1\0\0018\0\0\0\1\0\0\0\240\0\0\0\1\1o\0\31\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/systemd1\0\0\0\0\0\0\0\3\1s\0\t\0\0\0StartUnit\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\6\1s\0\30\0\0\0org.freedesktop."..., 208}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 208
getuid()                                = 0
sendmsg(13, {msg_name(0)=NULL, msg_iov(2)=[{"l\2\1\1&\0\0\0\1\0\0\0\17\0\0\0\5\1u\0\1\0\0\0\10\1g\0\1o\0\0", 32}, {"!\0\0\0/org/freedesktop/systemd1/job/242\0", 38}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70
sendmsg(13, {msg_name(0)=NULL, msg_iov(2)=[{"l\4\1\1H\0\0\0\2\0\0\0\206\0\0\0\1\1o\0!\0\0\0/org/freedesktop/systemd1/job/242\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\3\1s\0\21\0\0\0PropertiesChange"..., 152}, {"\34\0\0\0org.freedesktop.systemd1.Job\0\0\0\0\34\0\0\0\5\0\0\0State\0\1s\0\0\0\0\7\0\0\0running\0\0\0\0\0", 72}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 224
sendmsg(35, {msg_name(0)=NULL, msg_iov(2)=[{"l\4\1\1H\0\0\0\302\0\0\0\206\0\0\0\1\1o\0!\0\0\0/org/freedesktop/systemd1/job/242\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\3\1s\0\21\0\0\0PropertiesChange"..., 152}, {"\34\0\0\0org.freedesktop.systemd1.Job\0\0\0\0\34\0\0\0\5\0\0\0State\0\1s\0\0\0\0\7\0\0\0running\0\0\0\0\0", 72}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 224
epoll_wait(4, {{EPOLLIN, {u32=2784593808, u64=94857137325968}}}, 33, 0) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 360201428}) = 0
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"l\1\0\1*\0\0\0\2\0\0\0\227\0\0\0\1\1o\0\31\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/systemd1\0\0\0\0\0\0\0\3\1s\0\7\0\0\0GetUnit\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\6\1s\0\30\0\0\0org.freedesktop.systemd1"..., 186}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 186
sendmsg(13, {msg_name(0)=NULL, msg_iov(2)=[{"l\2\1\1S\0\0\0\3\0\0\0\17\0\0\0\5\1u\0\2\0\0\0\10\1g\0\1o\0\0", 32}, {"N\0\0\0/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2deth0_2edevice\0", 83}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 115
epoll_wait(4, {}, 33, 0)                = 0
clock_gettime(CLOCK_BOOTTIME, {2508, 360292582}) = 0
epoll_wait(4, {{EPOLLIN, {u32=2784593808, u64=94857137325968}}}, 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 360319860}) = 0
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"l\1\0\0019\0\0\0\3\0\0\0\300\0\0\0\1\1o\0N\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2deth0_2edevice\0\0\3\1s\0\3\0\0\0Get\0\0\0\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Pro"..., 241}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=901, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 241
lstat("/etc", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/etc/systemd", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/etc/systemd/system", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/etc/systemd/system/sys-subsystem-net-devices-eth0.device.d", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
lstat("/run", {st_mode=S_IFDIR|0755, st_size=620, ...}) = 0
lstat("/run/systemd", {st_mode=S_IFDIR|0755, st_size=400, ...}) = 0
lstat("/run/systemd/system", {st_mode=S_IFDIR|0755, st_size=120, ...}) = 0
lstat("/run/systemd/system/sys-subsystem-net-devices-eth0.device.d", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
lstat("/run", {st_mode=S_IFDIR|0755, st_size=620, ...}) = 0
lstat("/run/systemd", {st_mode=S_IFDIR|0755, st_size=400, ...}) = 0
lstat("/run/systemd/generator", {st_mode=S_IFDIR|0755, st_size=360, ...}) = 0
lstat("/run/systemd/generator/sys-subsystem-net-devices-eth0.device.d", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
lstat("/usr/local/lib", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
lstat("/usr/local/lib/systemd", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
lstat("/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/lib/systemd", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/lib/systemd/system", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
lstat("/lib/systemd/system/sys-subsystem-net-devices-eth0.device.d", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/lib/systemd", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/lib/systemd/system", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
lstat("/run", {st_mode=S_IFDIR|0755, st_size=620, ...}) = 0
lstat("/run/systemd", {st_mode=S_IFDIR|0755, st_size=400, ...}) = 0
lstat("/run/systemd/generator.late", {st_mode=S_IFDIR|0755, st_size=440, ...}) = 0
lstat("/run/systemd/generator.late/sys-subsystem-net-devices-eth0.device.d", 0x7ffec2dd0540) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/systemd/system/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/systemd/system/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/systemd/generator/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/systemd/system/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/systemd/system/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/systemd/system/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/systemd/generator.late/sys-subsystem-net-devices-eth0.device.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
sendmsg(13, {msg_name(0)=NULL, msg_iov(2)=[{"l\2\1\1\10\0\0\0\4\0\0\0\17\0\0\0\5\1u\0\3\0\0\0\10\1g\0\1v\0\0", 32}, {"\1b\0\0\0\0\0\0", 8}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 40
epoll_wait(4, {{EPOLLIN, {u32=3, u64=3}}}, 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {2508, 883466886}) = 0
read(29, "\1\0\0\0\0\0\0\0", 8)         = 8
timerfd_settime(29, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={2509, 383043000}}, NULL) = 0
epoll_wait(4, {{EPOLLIN, {u32=3, u64=3}}}, 33, -1) = 1

So what we have here is the acceptance of the connection, sending and receiving messages like those captured with dbus-monitor above, but nothing meaningful was executed: There was no productive system call done, and systemctl didn’t fork. But we can also see which files (actually, directories) systemd was looking for, and didn’t find: It really wanted to find some sys-subsystem-net-devices-eth0.device.d in one of the famous paths. Not that it matters so much, though.

By contrast and for example, if “systemctl start atd” is launched and atd is not already running, systemd (as process 1) forks into another process and calls execve(“/usr/sbin/atd”) on the forked process (after a whole lot of cgroup stuff, closing files etc.). If the same systemctl command is called with the atd service already running, there is no such fork (not surprisingly, systemd does nothing when attempting to start an already started service).

For the record, the failed lookup of directories ins’t the problem: I had the luxury of trying exactly the same on a machine that doesn’t get stuck on starting sys-subsystem-net-devices-eth0.device, and the strace looked the same. Except that the systemd job was terminated immediately and successfully, rather than getting stuck.

On my own behalf, this was the moment I realized that this unit shouldn’t be started at all on the system it gets stuck on.

Checking udev

If a boot problem is related to a device, maybe something went wrong with the device’s bringup, which in turn prevented the relevant .device unit to become active, and then some other unit waits for it…?

So what is running when eth0 is detected?

# udevadm test /sys/class/net/eth0
calling: test
version 215
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

load module index
Network interface NamePolicy= disabled on kernel commandline, ignoring.
timestamp of '/etc/systemd/network' changed
timestamp of '/lib/systemd/network' changed
Parsed configuration file /lib/systemd/network/99-default.link
Created link configuration context.
timestamp of '/etc/udev/rules.d' changed
read rules file: /lib/udev/rules.d/42-usb-hid-pm.rules
read rules file: /lib/udev/rules.d/50-bluetooth-hci-auto-poweron.rules
read rules file: /lib/udev/rules.d/50-firmware.rules
read rules file: /lib/udev/rules.d/50-udev-default.rules
read rules file: /lib/udev/rules.d/55-dm.rules

[ ... ]

read rules file: /etc/udev/rules.d/90-local-imagedisk.rules
read rules file: /lib/udev/rules.d/95-cd-devices.rules
read rules file: /lib/udev/rules.d/95-udev-late.rules
read rules file: /lib/udev/rules.d/97-hid2hci.rules
read rules file: /lib/udev/rules.d/99-systemd.rules
rules contain 393216 bytes tokens (32768 * 12 bytes), 23074 bytes strings
21081 strings (168928 bytes), 18407 de-duplicated (148529 bytes), 2675 trie nodes used
NAME 'eth0' /etc/udev/rules.d/70-persistent-net.rules:2
IMPORT builtin 'net_id' /lib/udev/rules.d/75-net-description.rules:6
IMPORT builtin 'hwdb' /lib/udev/rules.d/75-net-description.rules:12
IMPORT builtin 'path_id' /lib/udev/rules.d/80-net-setup-link.rules:5
IMPORT builtin 'net_setup_link' /lib/udev/rules.d/80-net-setup-link.rules:11
Config file /lib/systemd/network/99-default.link applies to device eth0
RUN 'net.agent' /lib/udev/rules.d/80-networking.rules:1
RUN '/lib/systemd/systemd-sysctl --prefix=/proc/sys/net/ipv4/conf/$name --prefix=/proc/sys/net/ipv4/neigh/$name --prefix=/proc/sys/net/ipv6/conf/$name --prefix=/proc/sys/net/ipv6/neigh/$name' /lib/udev/rules.d/99-systemd.rules:61
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/net/eth0
ID_BUS=pci
ID_MODEL_FROM_DATABASE=RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (Motherboard)
ID_MODEL_ID=0x8168
ID_NET_DRIVER=r8169
ID_NET_NAME_MAC=enx408d5c4d1b15
ID_NET_NAME_PATH=enp1s0
ID_PATH=pci-0000:01:00.0
ID_PATH_TAG=pci-0000_01_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Realtek Semiconductor Co., Ltd.
ID_VENDOR_ID=0x10ec
IFINDEX=2
INTERFACE=eth0
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth0
TAGS=:systemd:
USEC_INITIALIZED=20767
run: 'net.agent'
run: '/lib/systemd/systemd-sysctl --prefix=/proc/sys/net/ipv4/conf/eth0 --prefix=/proc/sys/net/ipv4/neigh/eth0 --prefix=/proc/sys/net/ipv6/conf/eth0 --prefix=/proc/sys/net/ipv6/neigh/eth0'
unload module index
Unloaded link configuration context.

Note that the “systemd” tag is in place, and so is the SYSTEMD_ALIAS assignment. So there’s probably no reason udev-wise why there was no .device unit activated.

hwdb.bin

Update the file /etc/udev/hwdb.bin:

# udevadm hwdb --update

Note that it doesn’t touch /lib/udev/hwdb.bin, and I’m unclear how they interact, if at all (“wild” guess: /etc/udev/hwdb.bin overrules the one in /lib/udev, if it exists).

On newer systems it appears to be “systemd-hwdb update”.

Solved: systemd boot waits 90 seconds on net-devices-eth0

Introduction

After installing wireshark (and tons of packages it depends on) on a rather fresh and bare-boned Debian 8 (Jessie), I got the “A start job is running for sys-subsystem-net-devices-eth0.device” message for a minute and half on every boot.

It was exceptionally difficult to find the reason, because so many packages were installed along with wireshark.

This is the short version of how this was solved. For the entire battery of stuff I tried out, I’ve written a separate post.

Bad omens

# systemctl status sys-subsystem-net-devices-eth0.device
● sys-subsystem-net-devices-eth0.device
   Loaded: loaded
   Active: inactive (dead)

May 20 12:47:07 diskless systemd[1]: Expecting device sys-subsystem-net-devices-eth0.device...
May 20 12:48:37 diskless systemd[1]: Job sys-subsystem-net-devices-eth0.device/start timed out.
May 20 12:48:37 diskless systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.

OK. Not surprising it’s not active. So start manually…?

# systemctl start sys-subsystem-net-devices-eth0.device
Job for sys-subsystem-net-devices-eth0.device timed out.

The second line appeared after a minute and a half, of course.

So I went to another, more recent machine (Mint 19) and went

$ systemctl status sys-subsystem-net-devices-eth0.device
● sys-subsystem-net-devices-eth0.device - Killer E2500 Gigabit Ethernet Controll
   Loaded: loaded
   Active: active (plugged) since Wed 2019-02-20 14:48:54 IST; 2 months 30 days
   Device: /sys/devices/pci0000:00/0000:00:1c.2/0000:04:00.0/net/eth0

And then comparing the outputs of just

$ systemctl

it became evident that *.device units are listed on the Mint 19 machine, but not on Debian 8.

Which led me to the conclusion that sys-subsystem-net-devices-eth0.device isn’t meant to be on Debian 8. That the problem isn’t that it’s not starting when commanded to do so, but that it’s not supposed to be started that way. The problem is that some other unit requests it.

As far as I understand, these .device units should become active and inactive by a systemd-udev event by virtue of udev labeling. They are there to trigger other units that depend on them, not to be controlled explicitly. For some reason they aren’t activated on the Debian 8 machine, despite udev rules being roughly the same as on the Mint 19 machine.

In the lack of proper docs (?), I’m left to guess that requesting a start or stop on device units means waiting for them to reach the desired state by themselves. This goes along with an observation I’ve made with strace, showing that systemd does nothing meaningful until it times out. So most likely, it just looked up the state of the device unit, saw it wasn’t started, and then went to sleep, essentially waiting for a udev event to bring the unit to the desired state, and consequently return a success status to the start request.

In fact, when I tried “systemctl stop” on the eth0 device on Mint 19 (i.e. the machine on which it was already loaded) it got stuck exactly the same way as for starting it on Debian 8. So that command probably meant “wait until eth0 goes away”.

Closing in

The trick is now to find which unit causes the attempt to kick off sys-subsystem-net-devices-eth0.device.

# journald -x

[ ... ]

May 20 11:41:20 diskless systemd[1]: Job sys-subsystem-net-devices-eth0.device/start timed out.
May 20 11:41:20 diskless systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.
-- Subject: Unit sys-subsystem-net-devices-eth0.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sys-subsystem-net-devices-eth0.device has failed.
--
-- The result is timeout.
May 20 11:41:20 diskless systemd[1]: Dependency failed for ifup for eth0.
-- Subject: Unit ifup@eth0.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ifup@eth0.service has failed.
--
-- The result is dependency.

In English: It’s clear that sys-subsystem-net-devices-eth0.device is the unit that didn’t manage to kick off. But the more important clue is that ifup@eth0.service failed, because it depends on the former. The easier solution lies in the latter.

But frankly, I don’t really understand what happened here. If eth0 was detected by systemd, why wasn’t the relevant device unit activated? Or if it wasn’t, why was ifup@eth0.service kicked off? The relevant unit file is a wildcard service, not naming any specific device name.

Solution

The textbook solution is to find why .device files aren’t generated at all on my Debian 8 system, fix that, and then there won’t be any delay. The correct solution in some cases is to manipulate the udev rules, adding a “TAG+=”systemd”" rule to the device, so the device unit is started automatically by systemd (man systemd.device). In my case this tag was already there, so it’s probably some issue with the service that’s supposed to respond to the udev event. So that’s a dead end.

So go the clumsy way: Remove the unit file that requests the device unit (or maybe I should have masked it by adding a file in /etc?). In this case, it’s /lib/systemd/system/ifup@.service, which said:

[Unit]
Description=ifup for %I
After=local-fs.target network-pre.target networking.service systemd-sysctl.service
Before=network.target
BindsTo=sys-subsystem-net-devices-%i.device
After=sys-subsystem-net-devices-%i.device
ConditionPathIsDirectory=/run/network
DefaultDependencies=no

[Service]
ExecStart=/sbin/ifup --allow=hotplug %I
ExecStop=/sbin/ifdown %I
RemainAfterExit=true

and then make sure this had no adverse side effects (none found so far). Actually, removing this file can’t be worse than it was when it took 90 seconds to boot, because this service wasn’t launched anyhow, as its precondition never started.

When mplayer plays a black window (or: Cinnamon leaking GPU memory)

The incident

All of the sudden, playing videos with Mplayer opened a black window. Sometimes going fullscreen helped, sometimes it didn’t, sometimes with video playing but without OSD. ffplay worked, but somewhat limping.

Setting: Linux Mint 19 on an x86_64, with a couple of fanless GeForce GT 1030 graphics cards and Cinnamon 3.8.9.

Mplayer’s output in this situation:

Playing IHS_1235.MOV.
libavformat version 57.83.100 (external)
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f858e2362a0]Protocol name not provided, cannot determine if input is local or a network protocol, buffers and access patterns cannot be configured optimally without knowing the protocol
[lavf] stream 0: video (h264), -vid 0
[lavf] stream 1: audio (pcm_s16le), -aid 0, -alang eng
VIDEO:  [H264]  1920x1080  24bpp  59.940 fps  36067.5 kbps (4402.8 kbyte/s)
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
libavcodec version 57.107.100 (external)
Selected video codec: [ffh264] vfm: ffmpeg (FFmpeg H.264)
==========================================================================
Opening audio decoder: [pcm] Uncompressed PCM audio decoder
AUDIO: 48000 Hz, 2 ch, s16le, 1536.0 kbit/100.00% (ratio: 192000->192000)
Selected audio codec: [pcm] afm: pcm (Uncompressed PCM)
==========================================================================
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
Starting playback...
Movie-Aspect is undefined - no prescaling applied.
VO: [vdpau] 1920x1080 => 1920x1080 Planar YV12
[vdpau] Error when calling vdp_output_surface_create: The system does not have enough resources to complete the requested operation at this time.
[vdpau] Error when calling vdp_output_surface_create: The system does not have enough resources to complete the requested operation at this time.
[vdpau] Error when calling vdp_output_surface_create: The system does not have enough resources to complete the requested operation at this time.
[vdpau] Error when calling vdp_output_surface_create: The system does not have enough resources to complete the requested operation at this time.
[vdpau] Error when calling vdp_presentation_queue_block_until_surface_idle: An invalid handle value was provided.
[vdpau] Error when calling vdp_video_mixer_render: An invalid handle value was provided.
[vdpau] Error when calling vdp_presentation_queue_display: An invalid handle value was provided.
A:   0.2 V:   0.0 A-V:  0.216 ct:  0.000   0/  0 ??% ??% ??,?% 0 0
[vdpau] Error when calling vdp_presentation_queue_block_until_surface_idle: An invalid handle value was provided.
[vdpau] Error when calling vdp_video_mixer_render: An invalid handle value was provided.
[vdpau] Error when calling vdp_presentation_queue_block_until_surface_idle: An invalid handle value was provided.
[vdpau] Error when calling vdp_video_mixer_render: An invalid handle value was provided.
[vdpau] Error when calling vdp_presentation_queue_display: An invalid handle value was provided.
[vdpau] Error when calling vdp_presentation_queue_display: An invalid handle value was provided.

And a lot of error messages, with “invalid handle value was provided” all over the place.

What does the graphics card have to say?

Opening Nvidia’s graphical control panel (Nvidia X Server Settings), it turns out that “User Dedicated Memory” stands at 1864 MB out of 1998 MB (93%). No wonder things don’t work.

OK, so who’s eating up all RAM? I have a wild guess, but nothing like getting it black on white:

$ nvidia-smi
Sun Apr 14 14:39:40 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 1030     Off  | 00000000:17:00.0 Off |                  N/A |
|  0%   41C    P8    N/A /  30W |      1MiB /  2001MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 1030     Off  | 00000000:65:00.0  On |                  N/A |
|  0%   51C    P8    N/A /  30W |   1914MiB /  1998MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1803      G   /usr/lib/xorg/Xorg                           433MiB |
|    1      2373      G   cinnamon                                    1310MiB |
|    1     54180      G   ...uest-channel-token=14764917277860092693   165MiB |
|    1     68188      G   /usr/bin/nvidia-settings                       0MiB |
+-----------------------------------------------------------------------------+

(The memory consumptions are at the far right on each line. Scroll to see them)

At that very moment it had slurped quite some CPU RAM as well: 5.7 GB virtual memory allocated and 1.3 GB resident (real RAM). So leaking memory everywhere. That’s after running two months.

The other hog is Google Chrome, by the way, (165 MiB), also after running continuously for two months.

Solution

The solution is surprisingly simple and harmless: Restart Cinammon. Yes, you can do this even if there are a lot of windows open, spread out in different workspaces. They will remain in place, don’t worry. Only the tabs will be reordered within each workspace, but that’s really small. To do this (as I mentioned on another post):

Press ALT-F2, type “r” and Enter. Look away for a few seconds, because what happens next looks like a sudden reboot, but it isn’t. All comes back.

Except a lot of memory has been freed. Resident CPU RAM went down from 1.3 GB to 256 MB, but even more important:

$ nvidia-smi
Sun Apr 14 14:49:19 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 1030     Off  | 00000000:17:00.0 Off |                  N/A |
|  0%   41C    P8    N/A /  30W |      1MiB /  2001MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 1030     Off  | 00000000:65:00.0  On |                  N/A |
|  0%   52C    P0    N/A /  30W |    701MiB /  1998MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1803      G   /usr/lib/xorg/Xorg                           498MiB |
|    1      2373      G   cinnamon                                      17MiB |
|    1     54180      G   ...uest-channel-token=14764917277860092693   177MiB |
+-----------------------------------------------------------------------------+

That’s a crash diet until the next time. Once a month, I guess.

Solved: Kintex-7 / KC705: MiG DDR3 fails to calibrate

As I implemented a MiG controller for KC705′s on-board SODIMM, the controller failed to calibrate at first. Despite that I’ve copied the instantiation and port connections from the example design. As for the pin placement, this was taken care of by the core itself, by virtue of memctl/memctl/user_design/constraints/memctl.xdc within MiG’s dedicated directory (which I called memctl).

And yet init_calib_complete remained low, indicating calibration had failed.

Actually, I had followed Xilinx’ XTP196 slides, except that I didn’t make an example design — I had my own.

Having ruled out holding the MiG controller in reset or a faulty pinout, it turned out that a constraint needs to be added to the application XDC file, namely

set_property slave_banks {32 34} [get_iobanks 33]

What it says (see UG912), is that I/O banks 32 and 34 should calibrate their on-chip terminations (DCI, Digitally Controller Impedance) based upon the reference resistors connected to the dedicated pins on bank 33. Without this constraints, on-chip termination on banks 32 and 34 doesn’t work, and the signal integrity on the relevant I/Os goes down the toilet. No wonder it didn’t calibrate.

The hint for this is on page 37 of XTP196, “Modifications to Example Design”, which tells us to overwrite the example design created by Vivado with a ZIP file Xilinx supplies. On the following page it lists the changes made, among others “Added DCI Cascade constraints to XDC”.

Setting up your own authoritative DNS server jots

What’s this?

These are somewhat random jots I made while setting up an authoritative BIND server, so that a simple VPS machine can function standalone. Well, almost standalone, as it takes some help from a slave DNS to supply the second DNS entry. But even if that slave goes away suddenly, the show will go on. So practically speaking, it’s a one machine show.

Sources

Todo when replacing slave server (note to self)

Note that it all about changing IP addresses, as the slave server is referred to with my own “ns2″ subdomain.

  • Update the glue record for the domain on which the name server is a subdomain.
  • Update the A record for the name server in the relevant bind zone files (and bump serial number, right…?).
  • Update allow-transfer and possibly also-notify in named.conf.local for all zones.
  • Update the DNS monitoring script.

General notes

  • DNS records are maintained on one server only (the master server), and the secondary server(s) follow suit. This is a quick & painless solution, and the update is virtually immediate when done right. This is discussed below. If you’re updating two servers, you’re doing it horribly wrong.
  • Use named-checkconf and named-checkzone to verify the configuration file and zone files, respectively
  • The service’s name in Debian 8 is “bind9″ (for systemctl purposes etc.). It’s a non-LSB (systemd) service, executed from /lib/systemd/system/bind9.service.
  • /etc/default/bind9 is ignored.
  • Note that the “dig” utility’s output is in fact a zone record. Copy and paste (actually, shorted the zone to the subdomain only for simplicity)
  • “dig example.com axfr” attempts to make a domain transfer for the said domain. Or more like:
    $ dig example.com axfr @example.com
  • All HOWTOs talk about reverse zones. As if someone normal had the delegation to answer those queries. Just ignore these parts.
  • The “command channel” that the server listens to on localhost:953 can be used with rndc, however this utility is intended for controlling the server in large terms (add a zone file etc.) and not through individual zone records. This is what nsupdate is for, but if you start playing with a dynamic record of a zone, keep your hands off the zone files. Citing nsupdate’s man page: “Zones that are under dynamic control via nsupdate or a DHCP server should not be edited by hand. Manual edits could conflict with dynamic updates and cause data to be lost.”
  • The allow-transfer parameter defines which IPs are allowed to issue a transfer request (copy all zone data). The default is anyone. Possibly restrict this to just the known slave servers.

Recursive, non-recursive and AXFR

The confusing part about a name server is that it conveys information in three different ways (that I can think of), for different purposes:

  • Recursive queries: It’s the cut-the-bullsh*t request for an IP of a domain name, issued by a e.g. a web browser. The server functions as the DNS of a (usually limited) net segment (defined by the allow-recursion option), and will ask around servers as necessary to reach the bottom line result of an IP address.
  • Non-recursive queries: The server answers supplies only records that is written in its own zone files (and maybe also cached records? Not sure about that). This is the mode for an authoritative server, supplying the records for some specific domains it’s responsible for.
  • AXFR: This is the “give me all you got” request from a slave of an authoritative server. This allows setting up the records on one machine, and have several other servers follow suit. Discussed below.

The DNS protocol allocates a bit on the query which tells if the request is recursive or not, and also a bit on the response, saying if was or not.

As “dig” makes recursive requests by default, authoritative servers (which are typically configured not to support recursive requests) will usually answer with a non-recursive response. Which will usually be exactly what we wanted in the first place (or why did we ask an authoritative server with “dig” in the first place?).

So for authoritative servers, the warning actually indicates proper behavior:

$ dig google.com @l.gtld-servers.net.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> google.com @l.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57926
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available

[ ... the name servers for google.com given here ... ]

Recursive queries were allowed to all by default in BIND until version 9.4.1, but later versions has it turned off by default, my own included. It makes sense, as most people who use BIND are likely to make a small authoritative server, and not a public DNS for everyone to enjoy.

Glue records or not

It’s very sleek to name your name servers with the domain it covers. It’s like ns1.example.com answering for example.com. As written in any guide to setting up a DNS, this makes a cyclic dependency: In order to find ns1.example.com, you first need to ask the delegated DNS, ns1.example.com, what IP it has. This is solved with glue records (the IPs are given explicitly in the “Additional Section” on the NS query).

In my specific case, I went for the cyclic dependency option for both name servers, mainly because the domain name of the chosen slave server didn’t resolve consistently to a single IP address. So it seemed safer to give the IP address myself with my own kind-of-bogus nameserver domain plus glue records which I set up on my registrar’s web interface. So it’s not just sleek, but it’s a good way to keep things stable. No reason to be afraid of this — it’s actually better.

As most people use their domains on hosted services, the name servers are given by the service provider, and hence their domain names have nothing to do with the hosted domain. Typically, the domain’s registrar offers a web interface for setting up the name servers, but only by their domain names.

Any serious registrar allows setting up glue records for cyclic dependencies explicitly. As a matter of fact, it won’t let you set a name server pointing at the same domain without any glue record given first. It can however be a bit confusing on how to do it on the web interface. For example, it might be under “Advanced Features” in the web management tool, called “Add Host Names”. It can also be called “personal name servers”.

Aside from Top Level Domain servers, glue record are rarely necessary, and are given in the “Additional” section as a neat shortcut. Actually, are they really “glue” when not absolutely necessary? Either way, as shortcuts, it seems like there are no rules for when they are present and when they aren’t. It’s like every server has its own rules.

Some name servers obtain glue addresses for other name servers by issuing lookups (or relying on their cache), and then present them in the “Additional” section. This is considered bad practice. A recent bind 9 won’t do this as an authoritative server, and the “fetch-glue yes/no” option is not available anymore.

This mess includes Top Level Domain servers as well, even though one could expect that they wouldn’t issue glue records unless they’re necessary. This is not to be confused with the responses to the exact same queries from common DNSes, which are usually more generous. Again, no fixed rules for this. For example,

$ dig NS walla.com @e.gtld-servers.net.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS walla.com @e.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30176
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;walla.com.			IN	NS

;; AUTHORITY SECTION:
walla.com.		172800	IN	NS	ns-550.awsdns-04.net.
walla.com.		172800	IN	NS	ns-277.awsdns-34.com.
walla.com.		172800	IN	NS	ns-1226.awsdns-25.org.
walla.com.		172800	IN	NS	ns-1976.awsdns-55.co.uk.

;; ADDITIONAL SECTION:
ns-277.awsdns-34.com.	172800	IN	A	205.251.193.21

So why is there a glue record for ns-277.awsdns-34.com? It stands by itself, and has its own glue records. And why not the others? Because they’re not .com? Go figure.

At some early point I had some concerns on that the glue record relies on the given IP and not a DNS query made by either the registrar’s web app or the Top Level Domain name servers displaying a cached result. If that was the case, the IP address could be lost for some reason and the cyclic dependency would be impossible to resolve. After some playing around, I got pretty much convinces noone’s is even near to let that happen.

Zone file notes

  • Always bump the serial number when making changes or the slave won’t catch them. The number must increase.
  • Any domain not ending with a ‘.’ will be considered relative to the origin domain
  • ‘@’ means the origin domain. It’s like a no-string for a domain.
  • If no (sub)zone is given in the beginning line, it’s the previously line’s (“repeat name”).
  • TXT records are limited to 2 kB, but the strings given in the zone file (within quotation marks) are limited to 256 bytes. To overcome this, it’s allowed to divide the strings into pieces, each with its own quotation marks, and a space between them. The actual text is the concatenation of the two strings in quotation marks, after removing these quotation marks. Think of it as a multi-line string in C language.
  • The last number in the SOA record, often called TTL, is the negative caching TTL: How long a server is allowed to cache a “no record exists” answer.
  • For each SPF record (of type TXT), add an identical record with type SPF. Or bind9 whines with
    zone example.com/IN: 'example.com' found SPF/TXT record but no SPF/SPF record found, add matching type SPF record

    even though SPF records died in 2014. But it won’t hurt (unless the slave doesn’t like it…?)

  • Unfortunately, there is no built-in variable / macro expansion in bind. Several strings, and in particular the IP address are repeated over and over again.
To check a zone file:
$ named-checkzone example.com db.example.com
To generate a canonical file (see if it was understood correctly):
$ named-compilezone -o out.txt example.com db.example.com

Propagating the domain records to slaves

Explained on this chapter of DNS and BIND. The short story is that some other server (the “slave”) issues AXFR queries to the master regarding a domain, and in response it get all the records for that domain. The slave then responds to DNS queries for that domain based upon the information obtained. This takes place when the master DNS sends a NOTIFY message to the slave, and/or with refresh intervals. And there’s a thing with the serial number, which must be higher on the master than the slave, or the slave considers its local data updated. Actually, some slaves will issue an AXFR regardless.

These NOTIFY requests are there to tell the slaves an update is required: Assuming that the “notify” setting of the master DNS it “yes”, when its DNS’s records are updated, it sends a NOTIFY message to all authoritative servers by default (plus those explicitly given with “also-notify”). Those which are defined as slaves of the notifying server check the serial number in the SOA record. If it is different from what they have, they issue a transfer request to the master (with an AXFR command, or the incremental variant, IXFR), and update their data from the info that arrives.

The “Refresh” entry of the SOA record relates to the slave’s periodical polling of the master. This time period is important only with old versions of BIND (before BIND 8), as they didn’t support the NOTIFY command. With the newer versions, it sets the periodic polling, which should have no significance except for a little load on both sides.

The notification messages and their responses are logged, and should be verified when changes are made.

The slaves may, and probably will, send NOTIFY messages to the authoritative DNSes when they’re done updating, but odds are that these will be ignored, as the common setting is that all slaves take info from a single master.

A NOTAUTH response means that the request was sent to an non-authoritative server (so it doesn’t have the info). But it can also be an excuse for refusing a transfer (the correct answer is REFUSED for that case).

Typical session when restarting bind after making changes (and changing the serial number!):

Mar 23 17:02:54 named[11923]: zone billauer.co.il/IN: sending notifies (serial 2019032101)
Mar 23 17:02:54 named[11923]: zone example.com/IN: sending notifies (serial 2019032101)
Mar 23 17:02:54 named[11923]: zone example2.org/IN: sending notifies (serial 2019032101)
Mar 23 17:02:57 named[11923]: client 116.203.6.3#24089 (billauer.co.il): transfer of 'billauer.co.il/IN': AXFR started
Mar 23 17:02:57 named[11923]: client 116.203.6.3#24089 (billauer.co.il): transfer of 'billauer.co.il/IN': AXFR ended
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24092 (example2.org): transfer of 'example2.org/IN': AXFR started
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24092 (example2.org): transfer of 'example2.org/IN': AXFR ended
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24094 (example.com): transfer of 'example.com/IN': AXFR started
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24094 (example.com): transfer of 'example.com/IN': AXFR ended

Finding a slave server

Having this all-in-one server, there’s only one things it can’t do by itself: Being a backup server. So you need to look for one. As DNS is a rather low-bandwidth service, the hosting price should be low to zero. I looked for one that provided this for free, mainly to save the hassle of annual payments. I’ve got enough of those.

It’s a backup after all, so if someone pulls the plug from the backup server all of the sudden, things will probably go on as usual for a while. So it’s in principle enough to trust the service provider that it won’t hijack your domain or something. And try picking one that will last, just to save the bother of setting the slave DNS up again.

As of March 2019, I found two alternatives for free slave services. There are probably many more.

BuddyNS is a company that was founded for supplying DNS services. It has lots of servers and a neat web interface. I got second thoughts when I realized that it’s a startup which hasn’t lifted off so well (again, March 2019), which makes it a potentially volatile choice.

So I went for Afraid FreeDNS (despite its not-so-encouraging name). They have quite a few options, but the free plan allows for a slave DNS mirror, and is called the “backup DNS” service. The web interface is simple, not so impressive, but very functional and to the point: A single page (login required to actually see something there), with a simple dashboard page saying what domains are being served, when they were last updated and when the last attempt took place. Plus a long log of events, including AXFRs that were either successful or failed, and if the latter why. And also AXFR requests that arrived from other servers to the slave and were rejected.

For any domain that needs slave coverage, the domain and its master DNS are fed into the web interface (a small “Add”) link. The master DNS must allow AXFRs from one IP address, 69.65.50.192. It takes a few minutes for the slave DNSes to update.

It’s also possible to allow AXFRs from other slaves, by setting up Slave AXFR-ALLOW ACL records. By default, AXFRs are rejected (as they should).

There’s only one DNS server for this slave service, ns2.afraid.org, with IP address 69.65.50.223 according to its authoritative server. It may sometimes resolve as 69.65.50.192 (note that this is the server that makes the AXFR requests). This double IP is harmless and by design, according to the DNS admin, who responded to my question on this matter.

I haven’t figured out the point of this as of yet, but the reason seems to be that the TLD server for .org gives the 69.65.50.192 address as a glue record (which isn’t really necessary) when asked about any .org domain that has ns2.afraid.org as a name server. So the DNS server asking about this caches this answer, and propagates it further. So the ISP-level DNS may sometimes answer 69.65.50.223 and sometimes 69.65.50.192, depending on its mood and cache.

The solution is simple, and it’s actually what I would suggest anyhow: Make it look like your own. Use a domain name of your own, with cyclic dependency, glue records and all that, and refer to the secondary name server by IP (using this domain, of course). This also makes it much easier to move to another server if necessary.

The art of setting up a sendmail server on Debian 8

But why?

Fact number one: Running your own mail server is the most likely cause for messing up, and that can mean an intrusion to the server or just turning it into a public toilet for spam.

Nevertheless, if mail delivery is important to you, there’s probably no way around. And I’m not talking about the ability to mass-mail. Even having plain, manually written, messages delivered to that semi-security-paranoid company, even if it has a ZIP attachment, can be a challenge. And no matter what ISP you have or other paid-for mail relay, there will always be someone else pushing junk through the same channel, and make the used mail relay’s reputation questionable.

And I’m also under the impression that paid-for mail relays won’t send you a bounce message if the destination server refuses to talk with them. Once I got my own running, I suddenly got a few of these. I now realize how some emails I sent in the past just vanished.

Not to mention that the emails reach their destination much faster with a private workhorse.

The key issue is to take control of your reputation. As simple as that. Use all possible means (detailed below) to ensure the recipient that it was you who sent it, and let the lack of blacklisting do the rest.

But, ehm, after all this preaching, the real reason I set up my own mail server was that I had no choice: My web host, which also took care of outgoing mail from my website drove me crazy with upgrades out of nowhere. So I went for VPS hosting, and that requires your own mailing server. For better and worse.

Port 25 might be blocked by ISP

This isn’t directly related, but important enough: My ISP, Netvision, blocks connection to port 25 from my computer, probably to avoid blacklisting of their IP addresses due to spamming from them.

This means that testing port 25 from my local computer is worthless and misleading. I suppose other ISPs do the same.

Use external tools for testing port 25.

Selecting server software

Debian 8 arrived with Postfix by default. Exim is popular. I’m used to qmail and sendmail. Difficult to choose. Security is important. If the server gets compromised, my domain turns into a spamhouse at best. I also need some advanced features (DKIM in particular).

I went for sendmail 8.14.4. It has a bad word of mouth, but its security advisory record over the last ten years is better than Postfix and surely better than Exim. That’s a surprise, but you can’t argue with facts.

And I could go for qmail there, but it seems like it needs patching to support DKIM, and then who knows if I haven’t just made a hole.

Goals

The server should

  • Open ports 25 and 587 for anyone to connect.
  • Relay any email received on ports 25 and 587 from localhost only, without authentication
  • Accept emails to local recipients ports 25 and 587 when connecting from foreign host. Port 25 is essential for inbound mail, but is sometimes blocked by firewalls, so open the other port as well.
  • Add a DKIM signature to emails going to foreign hosts only
  • Refuse to VRFY and EXPN
  • Accept all emails (from external mailers as well) even if they don’t have rDNS entries etc (let the spam filter handle them)

Checklist

Note that a lot of these items are detailed further down this post. I also suggest taking a look on my post on SPF, DKIM and DMARC and possibly this, this and this as well.

  • Verify that neither your IP nor your domain name have a bad reputation with a blacklist check.
  • Make an rDNS record for the mail server’s IP. It should better begin a “mail” or “smtp” or “mx” subdomain. Make sure there’s only one rDNS record with a reverse DNS lookup (e.g. dig -x). Sounds silly, but happened to me.
  • Set up the mail server properly and safely. Run a security check.
  • Set up the firewall to kill any IPv6 traffic (in particular reject, not drop, outgoing packets)
  • Create SPF, DKIM and DMARC DNS record for the server. For SPF, with and without the “mx” subdomain.
  • If the server has another name internally (other than the mx subdomain), make sure it has an A DNS record as well as an SPF one.
  • Verify that the outgoing mail goes out right with this DKIM validator, which allows sending mail to it, and then see exactly how it arrived + results on the validation. Invaluable.
  • Run a verbose manual mail submission and verify everything makes sense. In particular, make sure the HELO/EHLO domain matches the rDNS. However don’t expect the EHLO on the internal submission (from the program we’re running to the local server) to be the externally known one.
  • Validate the DMARC DNS record for your domain by sending a test email to autoreply@dmarctest.org (or any other one listed here). My anecdotal experience is that Gmail refused to accept mail (as in “Service unavailable” SMTP rejection) from a domain until I added a DMARC record.
  • Check any programs (web applications in particular) than send email, and verify that the envelope sender (MAIL FROM) makes sense (preferably the same as the From header). Best to send mail to some Gmail account, and see what it found the smtp.mailfrom to be. It it’s not a legal domain there, Gmail refuses to accept the mail.
And then make friends with those who have a say on spam detection:
  • Query your IP’s status at SPFBL and possibly delist it from the blacklist. It requires a working MTA on the server with postmaster being user on the domain. Spamassassin relies on this service.
  • Register the domain at Gmail’s Postmaster Tools to solve delivery problems to Gmail if such occur. I also have a feeling that this might reduce Gmail’s spam rating of the domain (it’s like someone takes responsibility for it).
  • Join their Smart Network Data Service (SDNS) and Junk Mail Reporting Program. Be sure not to be listed on Microsoft’s IP blacklist (or delist your IP address anyhow, just to be safe). For starter info, go here. On their troubleshooting page for postmasters, it says new IPs are likely to have issues. They suggest getting a certification from Return Path and/or creating a Microsoft account, and get a “wealth of information”.
    Note that this is important for delivery of mail to any institution that has made the mistake of relying on Microsoft’s mail infrastructure. A proper delisting takes you from 

    Mar 11 20:18:23 sm-mta[5817]: x2BKIL2H005815: to=<xxxxxxx@mit.edu>, delay=00:00:02, xdelay=00:00:02, mailer=esmtp, pri=121914, relay=mit-edu.mail.protection.outlook.com. [104.47.42.36], dsn=5.7.606, stat=User unknown

    (but the bounce message indicated that it’s not an unknown user, but a blacklisted IP number) to

    Mar 11 21:15:12 sm-mta[6170]: x2BLF8rT006168: to=<xxxxxxx@mit.edu>, delay=00:00:03, xdelay=00:00:03, mailer=esmtp, pri=121915, relay=mit-edu.mail.protection.outlook.com. [104.47.42.36], dsn=2.0.0, stat=Sent (<5C86CFDC.6000206@example.com> [InternalId=11420318042095, Hostname=DM5PR01MB2345.prod.exchangelabs.com] 11012 bytes in 0.191, 56.057 KB/sec Queued mail for delivery)

Setting up sendmail

Important general note: Sendmail is made to work sensibly out of the box. It’s clever enough to relay any mail received from localhost to external servers, and not to do that with mails from external connections. Unless you explicitly tell it to become a spam relay. The default configuration files are installed with apt are fine and probably secure.

Sendmail’s internals, on the other hand, with all macros and stuff, is completely horrible.

So the trick is to make minimal changes. There really isn’t much that needs to be done. For a fairly regular mail configuration, there is very little to do (on Debian 8, that is).

So first, install it:

# apt install sendmail

Not just sendmail-bin. It won’t work. Don’t install rmail — it’s for UUCP. Which is ancient and disabled anyhow.

Now changes in the configuration file. By default on Debian 8, sendmail listens to port 25 and 587 at IPv4′s localhost only, and relays mails to external servers as necessary. In order to open ports 25 and 587 for incoming mail to local addresses only from any host, change the line in /etc/mail/sendmail.mc saying

DAEMON_OPTIONS(`Family=inet,  Name=MTA-v4, Port=smtp, Addr=127.0.0.1')dnl
DAEMON_OPTIONS(`Family=inet,  Name=MSP-v4, Port=submission, M=Ea, Addr=127.0.0.1')dnl

to

DAEMON_OPTIONS(`Family=inet,  Name=IPv4-port-25, Port=smtp, M=E')dnl
DAEMON_OPTIONS(`Family=inet,  Name=IPv4-port-587, Port=submission, M=E')dnl

Let’s explain the changes:

  • Most important, the “Addr=” part was dropped, meaning connections from any host is allowed. Sendmail isn’t stupid: If the connection is from localhost, the destination can be any (including relaying to any host on the web), but if it comes from anywhere else, it’s for local addresses only. So we don’t turn into a spam machine. In other words, this is what a session with an external client looks like:
    >>> MAIL FROM:<sender@nowhere.com>
    <<< 250 2.1.0 <sender@nowhere.com>... Sender ok
    >>> RCPT TO:<anybody@not-here.com>
    <<< 550 5.7.1 <anybody@not-here.com>... Relaying denied
  • It’s “M=E” for both. Note that the “a” part was dropped, so access is without authentication. It’s intended for anyone to drop mails to local users. On the other hand “E” prevents ETRN on both, as they are both exposed.
  • The change in “Name”. Well, it’s just a name with the sole purpose of appearing in the logs on the “daemon=” part. So it better say something meaningful to humans, like which port the connection took place on.

This is a good time to mention that in sendmailish, it’s as if there were two separate MTA daemons, one for each port. This is the terminology used in the log.

Quote at the top of the file, after the DOMAN() assignment, I added a

define(`confDOMAIN_NAME', `mx.example.com')dnl

This sets the sendmail’s host name, as presented while talking to clients, in particular on HELO/EHLO (there is no need to set the confHELO_NAME / HeloName option). Even if it happens to give the correct name without it, I would set it like this. It’s crucial that identifies itself with the name it’s expected to give, or SPF checks can fail.

And of course, set it to the rDNS of your IP address, not mx.example.com.

Setting up “virtual users”

Having email addresses that don’t match any actual user names on the machine requires defining “virtual users”. But first, it’s essential to tell sendmail to accept emails to other domains than its own. To do this, add one line for each domain. If there are subdomains, add one line for each as well (by default, sendmail wants this explicitly). So I added the following line to /etc/mail/local-host-names:

billauer.co.il

This makes sendmail consider these domains local. An important side effect of this is that now root@billauer.co.il is a legal alias for the local root account. This is an often guessed address by spammers. Handled below.

Then enable virtual users. I put this after the other FEATURE statements in/etc/mail/sendmail.mc:

FEATURE(`virtusertable')dnl

And then run “make” under /etc/mail to update sendmail.cf. And restart sendmail.

Finally, prepare a file with a list of mail addresses, and to which read user they should be routed. First column in the mail address, the second is the target. For simplicity, keep the second column with real local users, but it’s also possible to use other first-column entries as the target. By why messing.

This goes to the file named /etc/mail/virtusertable. This is what it could look like:

someone@billauer.co.il		root
not-me@billauer.co.il		root

And then call “make” under /etc/mail, which updates /etc/mail/virtusertable.db. There is no need to restart sendmail to make the changes in virtusertable.db take effect.

Mail addresses as well as domains are case-insensitive, of course. But there are no shortcuts with subdomains: Everything after the “@” must match.

Now preventing spammers from sending mails to root@billauer.co.il. Just add this line to /etc/mail/virtusertable:

root@billauer.co.il                     error:nouser User unknown

This causes sendmail to reject the mail address flat at connection:

Apr 23 08:29:05 sm-mta[12752]: x3N8T4Sn012752: <root@billauer.co.il>... User unknown

But what happens if an internal mail to root is sent, from some cron job, for example? Is it rejected as well? That wasn’t the purpose. Well, on my machine this isn’t a problem, because these mails are sent to root@theserver.billauer.co.il (as defined in /etc/hosts?), so they’re not caught by the virtual user rule above. I don’t know what the result would be without this subdomain thing.

Rejecting IPv6

Why? Because IPv6 is where everything gets messy. Sendmail is already configured not to listen to IPv6, but then, when it’s about to relay to another server, things get ugly. In particular with Gmail, which supplies an IPv6 AAAA DNS entry for its MX servers.

The problem is that sendmail first attempts IPv6, no matter what (see Nov 30 2018 remark after some discussion on this page).It seems to be an Microsoft-style attempt to push IPv6 by forcing everyone to use it. I would have compiled sendmail myself to get rid of this “feature”, but there’s an easier way. So my own attempt to add a

CLIENT_OPTIONS(`Family=inet')dnl

in the sendmail.mc file, turning into

O ClientPortOptions=Family=inet

in sendmail.cf, didn’t make any difference. It should have turned IPv6 off, but didn’t: Sendmail tries IPv6 first, fails, among others because my firewall kills all incoming IPv6 packets, and after a minute goes for IPv4. So why wait?

My solution was to set the firewall to reject the outgoing IPv6 packet, so any TCP connection gets an immediate RST. This doesn’t prevent sendmail from trying IPv6, but makes it clear it’s a no-go. So it doesn’t waste time on it.

These are my firewall rules for that. It’s the OUTPUT rules that I added specially for sendmail:

# ip6tables -A INPUT -i lo -j ACCEPT
# ip6tables -A INPUT -j DROP
# ip6tables -A OUTPUT -o lo -j ACCEPT
# ip6tables -A OUTPUT -j REJECT --reject-with icmp6-addr-unreachable

Reviewing sendmail’s setup

For the real masochists out there, open /etc/sendmail.cf.

  • Lines starting with # are comments, of course.
  • Lines starting with “O” are options.
  • Searching the file for “=/” reveals all file-relating settings (because it’s an assignment followed by the beginning of an absolute path)

Setting up DKIM

I have a separate post on DKIM and friends. Better take a look if this is Chinese to you.

opendkim is made to work sensibly. It is inserted as a mail filter (“Milter”) for sendmail, making it sign outbound messages, and check inbound messages. As with sendmail, there are a few things to set up, and it’s good to go.

Following this guide (more or less). And man opendkim.conf, which is good. First, install:

# apt install opendkim opendkim-tools

Then create the keys:

# mkdir -p opendkim/keys/billauer.co.il
# opendkim-genkey -D /etc/opendkim/keys/billauer.co.il/ -d billauer.co.il -s dkim2019
# chown -R opendkim:opendkim /etc/opendkim/keys/

Now Configuration. The only changes I needed to make from the default files were: Edit /etc/default/opendkim, adding the following line at the end, so a TCP port is opened:

SOCKET="inet:8891@localhost" # listen on localhost port 8891

and since I need to sign multiple domains, added these two lines to /etc/opendkim.conf

KeyTable		refile:/etc/opendkim/KeyTable
SigningTable		refile:/etc/opendkim/SigningTable

and added the two following files. /etc/opendkim/KeyTable reading

dkim2019._domainkey.billauer.co.il billauer.co.il:dkim2019:/etc/opendkim/keys/billauer.co.il/dkim2019.private
dkim2019._domainkey.example.com example.com:dkim2019:/etc/opendkim/keys/example.com/dkim2019.private

and /etc/opendkim/SigningTable:

*@billauer.co.il dkim2019._domainkey.billauer.co.il
*@example.com dkim2019._domainkey.example.com

For a server whose outbound messages come only from localhost, there’s no need to set neither InternalHosts nor ExternalIgnoreList, as this is the default. These appear in a lot of tutorials.

Finally, make the DKIM a mail filter (“Milter”) on sendmail by adding this line at the end of sendmail.mc (and run “make” + restart sendmail):

INPUT_MAIL_FILTER(`opendkim', `S=inet:8891@127.0.0.1, F=T')

Note the “F=T” part. It will make sendmail refuse to accept mails if the DKIM server isn’t responding properly with a

451 4.3.2 Please try again later

The default is to pass the mail through without the milter if it doesn’t work, which would mean sending unsigned mails without paying attention. The backside of this is that no mail will arrive either if this happens, but at least the delivery won’t fail completely (assuming the issue is resolved within a day or so).

Don’t forget to set up the TXT DNS records with the *.txt files generated with opendkim-genkey. These files are written in zone format for the bind daemon. The actual text is the concatenation of the two strings in quotation marks, after removing these quotation marks. Think of it as a multi-line string in C language.

All done? Use this DKIM validator to see exactly how well it went.

Remove outbound messages from the mailing queue

# mailq
MSP Queue status...
/var/spool/mqueue-client is empty
		Total requests: 0
MTA Queue status...
		/var/spool/mqueue (2 requests)
-----Q-ID----- --Size-- -----Q-Time----- ------------Sender/Recipient-----------
x22FdWV1010569     1864 Sat Mar  2 10:39 MAILER-DAEMON
                 (Deferred: Connection timed out with server.com.)
					 <ze@server.com>
x22FMGAi009668       17 Sat Mar  2 10:23 <this@there.com>
                 (Deferred: Connection timed out with example.com.)
					 <eli@example.com>
		Total requests: 2
# cd /var/spool/mqueue
# rm *x22FdWV1010569
# rm *x22FMGAi009668
# systemctl restart sendmail

Gmail won’t talk with anyone

Gmail’s server doesn’t respond to a SYN at port 587 or 25, and won’t talk to you unless you have an rDNS. Only after having the rDNS set on the server:

# nc gmail-smtp-in.l.google.com. 25
220 mx.google.com ESMTP y6si2100605wmi.83 - gsmtp

And that’s just the beginning. Without having DMARC set up, it wouldn’t relay my mails. More on DMARC here.

Sources of information

Digging to the root with DNS queries

Introduction

This is an explicit walkthrough on how a domain name is resolved. Doing the recursion manually, that is.

And then some remarks on the mess with DNS glue records.

Getting the root servers

$ dig NS .

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS .
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59540
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 14

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;.				IN	NS

;; ANSWER SECTION:
.			49053	IN	NS	b.root-servers.net.
.			49053	IN	NS	f.root-servers.net.
.			49053	IN	NS	a.root-servers.net.
.			49053	IN	NS	h.root-servers.net.
.			49053	IN	NS	k.root-servers.net.
.			49053	IN	NS	l.root-servers.net.
.			49053	IN	NS	e.root-servers.net.
.			49053	IN	NS	g.root-servers.net.
.			49053	IN	NS	j.root-servers.net.
.			49053	IN	NS	d.root-servers.net.
.			49053	IN	NS	c.root-servers.net.
.			49053	IN	NS	i.root-servers.net.
.			49053	IN	NS	m.root-servers.net.

;; ADDITIONAL SECTION:
a.root-servers.net.	567453	IN	A	198.41.0.4
b.root-servers.net.	547997	IN	A	199.9.14.201
c.root-servers.net.	314914	IN	A	192.33.4.12
d.root-servers.net.	478361	IN	A	199.7.91.13
e.root-servers.net.	326962	IN	A	192.203.230.10
f.root-servers.net.	514616	IN	A	192.5.5.241
g.root-servers.net.	575480	IN	A	192.112.36.4
h.root-servers.net.	592754	IN	A	198.97.190.53
i.root-servers.net.	596171	IN	A	192.36.148.17
j.root-servers.net.	591102	IN	A	192.58.128.30
k.root-servers.net.	580970	IN	A	193.0.14.129
l.root-servers.net.	523957	IN	A	199.7.83.42
m.root-servers.net.	603222	IN	A	202.12.27.33

;; Query time: 19 msec

This was a very fast query, because the info is in any DNS’ zone files. This is the piece of info it must know to begin with.

Getting the name servers for .com

So, who are the top level domain servers? I’ll ask the authoritative server directly (this is unnecessary if you just want the answer, so “dig NS com” would have been enough):

$ dig NS com @e.root-servers.net.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS com @e.root-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11329
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;com.				IN	NS

;; AUTHORITY SECTION:
com.			172800	IN	NS	l.gtld-servers.net.
com.			172800	IN	NS	b.gtld-servers.net.
com.			172800	IN	NS	c.gtld-servers.net.
com.			172800	IN	NS	d.gtld-servers.net.
com.			172800	IN	NS	e.gtld-servers.net.
com.			172800	IN	NS	f.gtld-servers.net.
com.			172800	IN	NS	g.gtld-servers.net.
com.			172800	IN	NS	a.gtld-servers.net.
com.			172800	IN	NS	h.gtld-servers.net.
com.			172800	IN	NS	i.gtld-servers.net.
com.			172800	IN	NS	j.gtld-servers.net.
com.			172800	IN	NS	k.gtld-servers.net.
com.			172800	IN	NS	m.gtld-servers.net.

;; ADDITIONAL SECTION:
l.gtld-servers.net.	172800	IN	A	192.41.162.30
l.gtld-servers.net.	172800	IN	AAAA	2001:500:d937::30
b.gtld-servers.net.	172800	IN	A	192.33.14.30
b.gtld-servers.net.	172800	IN	AAAA	2001:503:231d::2:30
c.gtld-servers.net.	172800	IN	A	192.26.92.30
c.gtld-servers.net.	172800	IN	AAAA	2001:503:83eb::30
d.gtld-servers.net.	172800	IN	A	192.31.80.30
d.gtld-servers.net.	172800	IN	AAAA	2001:500:856e::30
e.gtld-servers.net.	172800	IN	A	192.12.94.30
e.gtld-servers.net.	172800	IN	AAAA	2001:502:1ca1::30
f.gtld-servers.net.	172800	IN	A	192.35.51.30
f.gtld-servers.net.	172800	IN	AAAA	2001:503:d414::30
g.gtld-servers.net.	172800	IN	A	192.42.93.30
g.gtld-servers.net.	172800	IN	AAAA	2001:503:eea3::30
a.gtld-servers.net.	172800	IN	A	192.5.6.30
a.gtld-servers.net.	172800	IN	AAAA	2001:503:a83e::2:30
h.gtld-servers.net.	172800	IN	A	192.54.112.30
h.gtld-servers.net.	172800	IN	AAAA	2001:502:8cc::30
i.gtld-servers.net.	172800	IN	A	192.43.172.30
i.gtld-servers.net.	172800	IN	AAAA	2001:503:39c1::30
j.gtld-servers.net.	172800	IN	A	192.48.79.30
j.gtld-servers.net.	172800	IN	AAAA	2001:502:7094::30
k.gtld-servers.net.	172800	IN	A	192.52.178.30
k.gtld-servers.net.	172800	IN	AAAA	2001:503:d2d::30
m.gtld-servers.net.	172800	IN	A	192.55.83.30
m.gtld-servers.net.	172800	IN	AAAA	2001:501:b1f9::30

;; Query time: 73 msec
;; SERVER: 192.203.230.10#53(192.203.230.10)

The next step: Get the domain’s name server

I just picked one of the name servers from the queries above. Once again, “dig NS google.com” will most likely give the same result.

$ dig NS google.com @j.gtld-servers.net.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS google.com @j.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37174
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	NS

;; AUTHORITY SECTION:
google.com.		172800	IN	NS	ns2.google.com.
google.com.		172800	IN	NS	ns1.google.com.
google.com.		172800	IN	NS	ns3.google.com.
google.com.		172800	IN	NS	ns4.google.com.

;; ADDITIONAL SECTION:
ns2.google.com.		172800	IN	AAAA	2001:4860:4802:34::a
ns2.google.com.		172800	IN	A	216.239.34.10
ns1.google.com.		172800	IN	AAAA	2001:4860:4802:32::a
ns1.google.com.		172800	IN	A	216.239.32.10
ns3.google.com.		172800	IN	AAAA	2001:4860:4802:36::a
ns3.google.com.		172800	IN	A	216.239.36.10
ns4.google.com.		172800	IN	AAAA	2001:4860:4802:38::a
ns4.google.com.		172800	IN	A	216.239.38.10

;; Query time: 74 msec
;; SERVER: 192.48.79.30#53(192.48.79.30)

So what’s the point in asking .com’s servers directly? For one, if I just changed the servers for my domain, and I want to see that change take effect immediately. Besides, I want the advertised TTL values and not those my ISP’s DNS happens to count down:

$ dig NS google.com

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15069
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	NS

;; ANSWER SECTION:
google.com.		62895	IN	NS	ns2.google.com.
google.com.		62895	IN	NS	ns1.google.com.
google.com.		62895	IN	NS	ns4.google.com.
google.com.		62895	IN	NS	ns3.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.		170367	IN	A	216.239.32.10
ns2.google.com.		240223	IN	A	216.239.34.10
ns3.google.com.		238882	IN	A	216.239.36.10
ns4.google.com.		248264	IN	A	216.239.38.10
ns1.google.com.		170367	IN	AAAA	2001:4860:4802:32::a
ns2.google.com.		167252	IN	AAAA	2001:4860:4802:34::a
ns3.google.com.		167252	IN	AAAA	2001:4860:4802:36::a
ns4.google.com.		159090	IN	AAAA	2001:4860:4802:38::a

;; Query time: 21 msec
;; SERVER: 10.2.0.1#53(10.2.0.1)

Final step: Get the address (or something)

This is a bit stupid, but let’s finish up:

$ dig A google.com @ns3.google.com.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> A google.com @ns3.google.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48652
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		300	IN	A	216.58.206.14

;; Query time: 92 msec
;; SERVER: 216.239.36.10#53(216.239.36.10)

The local DNS had another answer in its cache. That’s OK. It also bombarded me with some other records, something an authoritative server is much less keen to do on an A query.

$ dig A google.com

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> A google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35893
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		235	IN	A	172.217.23.174

;; AUTHORITY SECTION:
google.com.		152204	IN	NS	ns3.google.com.
google.com.		152204	IN	NS	ns4.google.com.
google.com.		152204	IN	NS	ns2.google.com.
google.com.		152204	IN	NS	ns1.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.		344065	IN	A	216.239.32.10
ns2.google.com.		310954	IN	A	216.239.34.10
ns3.google.com.		324838	IN	A	216.239.36.10
ns4.google.com.		247342	IN	A	216.239.38.10
ns1.google.com.		254137	IN	AAAA	2001:4860:4802:32::a
ns2.google.com.		345035	IN	AAAA	2001:4860:4802:34::a
ns3.google.com.		345503	IN	AAAA	2001:4860:4802:36::a
ns4.google.com.		172241	IN	AAAA	2001:4860:4802:38::a

;; Query time: 19 msec
;; SERVER: 10.2.0.1#53(10.2.0.1)

Glue records: Place for improvisations

Note that the name servers for google.com are subdomains of google.com. This is fine, because there are glue records in the “Additional Section” the give the IP addresses explicitly. Without these, it would have been impossible to resolve any of google.com’s addresses (it would have got stuck on obtaining the address of e.g. ns1.google.com).

That isn’t so trivial. For example, let’s look at netvision.net.il’s name server record, as reported by the authoritative server for net.il:

$ dig NS netvision.net.il @ns2.ns.il.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS netvision.net.il @ns2.ns.il.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43809
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;netvision.net.il.		IN	NS

;; AUTHORITY SECTION:
netvision.net.il.	86400	IN	NS	dns.netvision.net.il.
netvision.net.il.	86400	IN	NS	eupop.netvision.net.il.
netvision.net.il.	86400	IN	NS	nypop.elron.net.

;; ADDITIONAL SECTION:
dns.netvision.net.il.	86400	IN	A	194.90.1.5
eupop.netvision.net.il.	86400	IN	A	212.143.194.5

;; Query time: 73 msec
;; SERVER: 162.88.57.1#53(162.88.57.1)

Note that there are glue records only for the NS records that belong to netvision.net.il. The nypop.elron.net. server (extra backup?) doesn’t have a glue record. It could have, as a DNS is allowed to answer for another domain in the special case of a glue record (see RFC 1033).

OK, so how do you resolve nypop.elron.net? You ask the nameserver for elron.net, of course! Let’s ask the authoritative server for .net:

$ dig NS @a.gtld-servers.net. elron.net.

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS @a.gtld-servers.net. elron.net.
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27469
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;elron.net.			IN	NS

;; AUTHORITY SECTION:
elron.net.		172800	IN	NS	dns.netvision.net.il.
elron.net.		172800	IN	NS	nypop.netvision.net.il.

;; Query time: 82 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)

Oops. That went back to netvision.net.il. So it doesn’t get us out of the loop. But here comes the funny part: Ask Netvision’s own DNS the same question:

$ dig NS elron.net. @dns.netvision.net.il

; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS elron.net. @dns.netvision.net.il
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10023
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c38902309e53e8ef454a61b25c98a0863c28b2c747ce090a (good)
;; QUESTION SECTION:
;elron.net.			IN	NS

;; ANSWER SECTION:
elron.net.		600	IN	NS	nypop.elron.net.
elron.net.		600	IN	NS	dns.netvision.net.il.

;; ADDITIONAL SECTION:
dns.netvision.net.il.	86400	IN	A	194.90.1.5
nypop.elron.net.	600	IN	A	199.203.1.20

;; Query time: 23 msec
;; SERVER: 194.90.1.5#53(194.90.1.5)

Cute, isn’t it? Not only is it a different answer, but the glue records are there. So if you make these queries from within Netvision’s infrastructure, you’re blind to the lack of glue records.

And these are the records of one of Israel’s largest ISPs.

nslookup

OK, this is soooo unrelated, and still, I thought I should mention nslookup as an alternative for dig. The former creates output that is more human readable, the latter more like zone file records. Pick your poison.

$ nslookup -type=NS google.com
Server:		10.2.0.1
Address:	10.2.0.1#53

Non-authoritative answer:
google.com	nameserver = ns1.google.com.
google.com	nameserver = ns3.google.com.
google.com	nameserver = ns4.google.com.
google.com	nameserver = ns2.google.com.

Authoritative answers can be found from:
ns1.google.com	internet address = 216.239.32.10
ns2.google.com	internet address = 216.239.34.10
ns3.google.com	internet address = 216.239.36.10
ns4.google.com	internet address = 216.239.38.10
ns1.google.com	has AAAA address 2001:4860:4802:32::a
ns2.google.com	has AAAA address 2001:4860:4802:34::a
ns3.google.com	has AAAA address 2001:4860:4802:36::a
ns4.google.com	has AAAA address 2001:4860:4802:38::a

and then, let’s ask the authoritative server the same question?

$ nslookup -type=NS google.com. j.gtld-servers.net.
Server:		j.gtld-servers.net.
Address:	192.48.79.30#53

Non-authoritative answer:
*** Can't find google.com.: No answer

Authoritative answers can be found from:
google.com	nameserver = ns2.google.com.
google.com	nameserver = ns1.google.com.
google.com	nameserver = ns3.google.com.
google.com	nameserver = ns4.google.com.
ns2.google.com	has AAAA address 2001:4860:4802:34::a
ns2.google.com	internet address = 216.239.34.10
ns1.google.com	has AAAA address 2001:4860:4802:32::a
ns1.google.com	internet address = 216.239.32.10
ns3.google.com	has AAAA address 2001:4860:4802:36::a
ns3.google.com	internet address = 216.239.36.10
ns4.google.com	has AAAA address 2001:4860:4802:38::a
ns4.google.com	internet address = 216.239.38.10

So it’s like more readable, but I miss those TTL records.

Gmail: How to turn off spam filter for incoming mails

Gmail is definitely the leader in the field of email services, and their spam filter is actually very good. From my own experience with setting up a mail server, I can tell that it’s not all that easy to make Gmail’s incoming mail servers even talk with you. So the larger part of spammer don’t even get the chance to suggest their piece of spam to Gmail.

The problem is that every now and then (quite rare, but still), an important email is classified as spam. Since I’m fetching the emails with fetchmail, and apply my own spam filter, I prefer getting them all. Messages that are classified as spam will not be available in the POP3 session that fetchmail makes.

How to do it: Simply add a mail filter. Click the upper-right gear, select “Settings” and choose the tab for filters. Then create a new mail filter with a condition that all email meet (I picked smaller than 100 MB) and check “Never send it to Spam”. That’s it.

The SPF, DKIM and DMARC trio: Making your email appear decent

Intro

Whether you just want your non-Gmail personal email to get through, or you have a website that produces transactional emails (those sent by your site or web app), there’s a long fight with spam filters ahead.

The war against unsolicited emails will probably go on as long as email is used, and it’s an ongoing battle where one leak is sealed and another is found. Those responsible for mail delivery constantly tweak their spam detectors’ parameters to minimize complaints. There are no general rules for what is detected as spam and what isn’t. What passes Gmail’s tests may very well fall on Security Industries’ mail server and vice versa. Each have their own experience.

But you want your message to reach them all. Always.

This is a guide to the main ideas and concepts behind the trio of mechanisms mentioned in the title. The purpose is to focus on the delicate and sometimes crucial details that are often missed in howtos everywhere. And also try to understand the rationale behind each mechanism, even though it might not be relevant when spam detector X is tuned to achieve the best results, given a current flow of spam mails with a certain pattern.

Howtos usually tell you to employ a DKIM signing software on the mail server, and make SPF and DKIM DNS records for “your domain”. Which one? Not necessarily trivial, as discussed below. And then possibly add a DMARC record as well. Will it really help? Also discussed.

Here’s the thing: Employing these elements will most likely do something good, even if you get it wrong. Setting up things without understanding what you’re doing can solve an immediate problem. This post focuses on understanding the machinery, so the best possible setting can be achieved.

Get your tie knot right.

Rationale: Domains cost money

There are different ideas behind each of the trio’s mechanisms, but there’s one solid idea behind them all: The reputation of a domain name.

If you’re a spammer, you can’t send thousands of emails that are linked to a domain name without wrecking its reputation rather quickly. So let’s make sure each domain name’s owner stands behind the mails sent on its behalf, and maintains its reputation. This requires a way to tell whether this owner really sent each mail, and not just a spammer abusing it. SPF and DKIM supply these mechanisms.

The cost of domain names makes it unworthy to purchase domains just for the sake of a few thousand mails until its reputation is dead meat. Well, sort of. There are .bid domains at $1.75 today. But .com and .org are still rather expensive.

DMARC takes this one step further, and allows a domain name owner to prevent the delivery of emails that weren’t sent on its behalf. It also puts the focus on the the sender given in the”From:” header, instead of other domains, which SPF and DKIM might relate to. This makes the junk domain concept even less eligible.

Despite all said above, I still get spam messages (of the random recipient type) with this trio perfectly set up. But they’re relatively rare.

The trio in short

These three techniques are fundamentally different in what they do. In brief for now, in more detail further below:

  • SPF: Defines the set of server IP addresses that are authorized to use a domain name to identify itself (HELO/EHLO) and/or the mail’s sender (MAIL FROM) in the SMTP exchange. Note that this doesn’t directly relate to the “From:” mail header, even though it does in many practical cases.
  • DKIM: A method to publish a public key in a DNS record for the digital signature of some parts of an email message, so this signature can be verified by any recipient. The domain name of this DNS record, which is given explicitly in the signature, doesn’t need to have any relation to the mail’s author, sender or any relaying server involved (even though it usually has). It’s just a placeholder for the accumulating reputation of mails that are signed with it.
  • DMARC: A mechanism to prevent the domain name from being abused by spammers. It basically tells the recipient than an email with a certain Author Domain (as it appears in “From:”) should pass an SPF and/or DKIM test, and what to do if not.

In essence, SPF authenticates the use of some mail relay servers, DKIM authenticates the message carrying its signature, and DMARC says what to do if the authentication(s) fail.

The DNS records

All three techniques rely on a DNS lookups for a TXT entry, which has the domain name included (let’s say we have example.com.):

  • SPF records are found as a TXT record for the domain itself (that is, example.com.).
  • DKIM records are the TXT records for the “selector._domainkey” subdomain, where “selector” is given in the mail message’s DKIM header. So it’s like default._domainkey.example.com (for selector=default).
  • DMARC record are the TXT entry for the “_dmarc” subdomain (i.e. _dmarc.example.com).

So it’s crucial which domain it is that the spam filter software considers to be “the domain”. Spoiler: DKIM and DMARC have this sorted out nicely. It’s SPF that is tricky.

Note that given an email message, the recipient can easily check whether it has SPF and DMARC records, but (without DMARC) it can’t know if there’s a relevant DKIM record available, because of the selector part. Consequently, adding a DKIM record and signing only part of the emails won’t backlash on those that aren’t signed.

Which domain is “the domain”?

Quite often, guides in these topics just say “the domain”, making it sound as if there’s only one domain involved. In fact, there are several to be aware of.

Let’s say that myself@example.com sends a mail by connecting to its ISP’s mail server mx.isp.com, which in turn relays it to the destination mail server. We then have four different domains involved.

  1. The domain of the author, appearing in the From header, shown to the human recipient as the sender. example.com in this case.
  2. The “envelope sender”, appearing in the MAIL FROM part of the SMTP conversation of the relay transmission. This could be example.com (the simple approach), but also something like bounce.isp.com. This is because the envelope sender is the bounce address, and some mail relays make up some kind of bogus bounce address so they can track the bouncing mails.
  3. The domain used in the HELO/EHLO part of the SMTP conversation of the relay transmission. Probably something like mail23.isp.com, as the ISP has many servers for relaying out.
  4. The rDNS domain entry of the IP address of the sender on the relay transmission. If this entry doesn’t exist, or isn’t exactly as the HELO/EHLO domain, hang the postmaster. Some mail servers won’t even talk with you unless they match.

I use the term “relay transmission” for the connection between two mail servers: Going from the server that accepted your message for transmission when you pressed “Send” to the server that holds the mail account of the mail’s recipient (i.e. destination of the MX record of the recipient’s full domain).

But oops. Mails are often relayed more than once before reaching their final station. Except for the first item in the list above, the domains are different on each such transmission. Which one counts? When does it count?

Luckily, this dilemma is pretty much limited to SPF. And with DMARC, it’s nonexistent.

SPF

At times, people just add an SPF record for their mail address’ domain with their relay servers’ IP range, and think they’ve covered themselves SPF-wise. Sometimes they did, and sometimes they didn’t. No escape from the gory details.

If you’re not familiar with the HELO/EHLO and MAIL FROM: SMTP tokens, I warmly suggest taking a quick look on another post of mine. It’s nearly impossible to understand SPF without it.

The SPF mechanism is quite simple: The server that receives the email looks up the TXT DNS record(s) for the domain name given in the envelope sender, that is in “MAIL FROM:”. If an SPF record exists, it checks if the IP address of the sender is in the allowed set, and if so, the SPF test is passed.

The domain name that is checked is the “domain portion” of the “MAIL FROM” identity (see RFC7208 section 4.1), or in other words, everything after the “@” character of the MAIL FROM. Or so it’s commonly understood: The RFC doesn’t define this term.

The receiver is likely to perform the same check on the HELO/EHLO identification of the sender. In fact, RFC7208 section 2.3 recommend performing it even before the MAIL FROM check. The SPF test will pass if either of the HELO/EHLO or the MAIL FROM check passes (the RFC doesn’t say this explicitly, but it’s clear from the argument for beginning with the more definite HELO/EHLO check).

This is important: Any mail server can ensure all mails that go through it pass the (non-DMARC) SPF test, just by having a DNS record on its full HELO/EHLO domain name. It’s silly not to have one. So if you’re setting up a mail server called mx.theisp.com, be sure to add SPF records for mx.theisp.com, allowing the IP of that server. This SPF test won’t count for DMARC purposes, but the “Received-SPF: pass” line in among the mail headers surely doesn’t hurt.Except for when DMARC is applied in one of its enforcing modes, there is no clear rule on what to do if this test fails or passes with one of the SMTP tokens or both. This is raw material for the spam detection software.

It’s however important to note that it’s perfectly normal that envelope address is made up completely by the mail relay, because it functions as a bounce address. So an email sent from myself@example.com may have the same envelope address, but it’s also perfectly normal that the MAIL FROM: would be bounce-3242535@bounce.isp.com. This allows the ISP to detect massive bouncing of emails, and possibly do something about it. In this case, the relaying server’s domain can be used to pass the (non-DMARC) SPF test instead.

Well, with the reputation per domain rationale, it actually does makes sense. But with DMARC, this won’t cut. The SPF record must belong to the “From:” sender. See below.

Now, the formal rules are nice, but if you just wrote a spam filter, would you check for the SPF record of the “From:” sender’s domain, even though it’s not really relevant according to the RFC? Of course you would. If the domain owner of the Author Address has given permission for a server to relay emails on its behalf, it’s a much stronger indication. So it’s probably a good idea to make such a record, even if makes no sense directly. And it makes you better prepared for DMARC.

As a matter of fact, it’s recommended to add SPF records for any domain and subdomain that may somehow appear in the mail, to the extent possible, of course. A DNS record is cheap, and you never know if a spam detector expects it to be there, whether it should or not.

Bottom line: We don’t really know how many points spam filter X gives an SPF record of this type or another. It depends on the history of previous spam. So try to cover all options, even those that aren’t required per RFC.

Information on setting up an SPF record is all around the web. I suggest starting with Wikipedia’s great entry and if you want to be accurate about it, in RFC7208.

DKIM

This is easiest explained through a real signature, taken from the header of a real mail message:

Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com;
 s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=vx0LAOXz3Mr8zS/Jy2ayKOep6NlflK3t+BpJyi78v9A=;
 b= [ ... ]

To verify this signature, a lookup for the DNS TXT entry for 20161025._domainkey.gmail.com is made.Note that except for the _domainkey part, the domain comprises of the s= (selector) and the d= (domain) assignments in the signature. The answer should contain an RSA public key for verifying that the hash of some selected headers (selected by h= ) is indeed signed by the blob in the b= assignment. That’s it. If the signature is OK, the DKIM test is passed.

Note that DKIM doesn’t (usually) sign the message body, so signing a message with DKIM doesn’t make you accountable for its content, only the fact that you sent that mail.

Also note that no other domains, that are related to the email, make any difference for passing the DKIM test itself. Not the sender’s not the mail relays’, nothing. Passing the DKIM test just means that the signing domain (gmail.com) has signed this message (actually, some of its headers) and therefore puts its own reputation on it. It doesn’t say anything on who sent the message.

The common practice is however that the signing domain is the From: header domain. Probably because DMARC can’t be applied otherwise, maybe also because the goal is to impress spam filters. Passing the DKIM test is nice formally, but if the spam filter thinks it fishy, it can backlash.

Another reason, from RFC4871, section 6.3: “If the message is signed on behalf of any address other than that in the From: header field, the mail system SHOULD take pains to ensure that the actual signing identity is clear to the reader.” Yeah right. I’ve seen Gmail verifying a DKIM signature of a domain which had nothing to do with anything in that message, surely not the sender. It just went “dkim=pass”.As for Spamassassin, it doesn’t care much about DKIM so far. Probably for good reasons, as I get a lot of spam messages with the DKIM signature perfectly done. So as of now, passing the DKIM test doesn’t change the score. Or more precisely, the existence of a DKIM signature increases the spam score (more spammy) with 0.1, but if the signature is correct, the score is reduced with 0.1. So we’re back to zero. If the signature belongs to the author (matching From: domain) the score is reduced (i.e. towards non-spam) by 0.1. All in all, a DKIM signature wins a score of 0.1 on Spamassassin. May not seem to be worth the efforts, but Spamassassin is not the only filter in the world. And it may change over time.

Finally a question: The MUA (e.g. Thunderbird) is allowed to put a DKIM signature, which would actually make sense: It allows a human end user to sign the emails directly, with no need for anything special on the relaying infrastructure. And there’s no problem with multiple RSA key pairs for multiple users of the domain, since the “s=” selector allows a virtually unlimited number of DKIM DNS records. Why there isn’t a plugin for at least Thunderbird is unclear to me. Maybe the answer lies in Spamassassin’s indifferent response to it.

DMARC

Suppose that you own company Example Ltd. with domain example.com, and you’ve decided that all mails from (as in header From:) that domain will be DKIM signed. Now some spam mail arrives from someone else, without a DKIM signature and fails the SPF test. But the recipient has no way to tell that it should pass such tests.

DMARC is the mechanism that tells the recipient what to expect, and what to do if the expectation isn’t met. This allows the owner of the domain to ensure only mails arriving from its own machines are accepted. Spam pretending to come from its domain is dropped.

This is what Gmail did to force emails from all its users (i.e. having a gmail.com address) to be relayed through their servers only. The TXT for _dmarc.gmail.com goes:

"v=DMARC1; p=none; sp=quarantine; rua=mailto:mailauth-reports@google.com"

In other words, if it isn’t proven to come from Gmail’s server, hold the message. Most servers just junk it.

And now to how the test is done. Spoiler II: DMARC isn’t interested in a domain test if it isn’t tightly linked with the “From:” header’s domain. Or as they call it: Aligned with the RFC5322.From field. This is huge difference.

Let’s take it directly from RFC7489, section 4.2:

A message satisfies the DMARC checks if at least one of the supported authentication mechanisms:

  1. produces a “pass” result, and
  2. produces that result based on an identifier that is in alignment, as defined in Section 3.

The “supported authentication mechanisms” for DMARC version 1 are SPF and DKIM, as listed in section 4.1 of the same RFC.

The first thing we learn is that it’s enough to pass one of SPF or DKIM. No need to have both for passing DMARC.

Second, the term “is in alignment” above. It’s defined in the RFC itself, and essentially means that the domain for which the SPF or DKIM passed is the same as the one in the From: header, possibly give or take subdomains. The only reason they didn’t just say that the domains must be equal is because of the possibility of “relaxed mode”, allowing an email from myself@mysubdomain.example.com to be approved by passing tests with the example.com domain. This is what “being in alignment” means in relaxed mode. In “strict mode” alignment occurs only when they’re perfectly equal.

If the email passes the DMARC test, there isn’t much to fuss about. If it fails, the decision what to do depends on the policy, as given in the relevant domain. Which, according to RFC7489 section 3 is: “Author Domain: The domain name of the apparent author, as extracted from the RFC5322.From field”. And then in section 4.3, item 7: “The DMARC module attempts to retrieve a policy from the DNS for that domain” (referring to the Author Domain).

So it’s a DNS query for the TXT record of the From: domain, with the “_dmarc” subdomain prepended. As in the example above for gmail.com.

Finally, a tricky point. If a mail server, for which the SPF test is made, didn’t use the Author Domain in its MAIL FROM nor in the HELO/EHLO, the SPF test is worthless for DMARC purposes. It’s however quite tempting to check the Author Domain for its SPF record nevertheless. I mean, if the Author Domain allows the IP address of the mail relay server, isn’t it good enough to pass a DMARC test? Doing this goes against the SPF’s RFC, and isn’t mentioned in any way in DMARC’s RFC. But it makes a lot of sense. I won’t be surprised if it’s common practice already.

Will DMARC make my email delivery better?

TL;DR: Surprisingly enough, yes.

The irony about DMARC is that it bites on the spam messages, and does very little on the legit ones. After all, if an email passed both the SPF and DKIM tests on the Author Domain, what is there left to say?

And if the same email passed only one of the tests, why would a DMARC record add reassurance?

Of course, if you want to fake mails pretending to be you, definitely apply DMARC.

But once again, noone knows how spam filter X behaves. Maybe someone found out that DMARC signed domains carry less spam, and tuned the filter in favor of them. And maybe the rejection of spam mails thanks to the DMARC record helped with the domain’s spam statistics. Even though I would expect any machine that maintains statistics to count the emails that pass SPF / DKIM tests separately.

And here comes the big surprise. Gmail refused to accept messages from my server until I added a DMARC record. Once I did it, I was all welcome. It makes no sense, but somehow, Google seems to like the very existence of a DMARC record. Maybe a coincidence, most likely not. So do yourself a favor, and add a TXT record to _dmarc.yourdomain.com:

v=DMARC1; p=none; sp=none; ruf=mailto:mailreports@yourdomain.com

This record tells the recipient to do nothing with a mail message that fails the DMARC test, so it’s harmless. But it will send an email to tell you about it to the email address given. Which can be useful in itself.

Conclusion

There might be official rules for entering a club, but in the end of the day, you can’t know what the doorkeeper looks at. So try to get everything as tidy as possible, and hope you won’t be mistaken for the bad guys.

And don’t wait for the first time you won’t be let in. It might be too late to fix it then.

SMTP tidbits for the to-be postmaster

This is a quick overview of the parts of an SMTP session that are relevant to SPF and mail server setup.

Just a sample SMTP session

For a starter, this is what an ESMTP session between two mail servers talking on port 25 can look like (shamelessly copied from this post, which also shows how I obtained it).

"eli@picky.server.com" <eli@picky.server.com>... Connecting to [127.0.0.1] via relay...
220 theserver.org ESMTP Sendmail 8.14.4/8.14.4; Sat, 18 Jun 2016 11:05:26 +0300
>>> EHLO theserver.org
250-theserver.org Hello localhost.localdomain [127.0.0.1], pleased to meet you
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE
250-DSN
250-ETRN
250-DELIVERBY
250 HELP
>>> MAIL From:<eli@theserver.org> SIZE=864
250 2.1.0 <eli@theserver.org>... Sender ok
>>> RCPT To:<eli@picky.server.com>
>>> DATA
250 2.1.5 <eli@picky.server.com>... Recipient ok
354 Enter mail, end with "." on a line by itself
>>> .
250 2.0.0 u5I85QQq030607 Message accepted for delivery
"eli@picky.server.com" <eli@picky.server.com>... Sent (u5I85QQq030607 Message accepted for delivery)
Closing connection to [127.0.0.1]
>>> QUIT
221 2.0.0 theserver.org closing connection

HELO / EHLO

This is the first thing the client says after the server More precisely, it says something like

HELO mail.example.com

This self-introduction is important: The server knows your IP, and probably makes a quick rDNS check on it, to see if you’re making this domain up. So the domain given in HELO must be the same as in the rDNS record. Exactly.

It doesn’t matter if this domain has nothing to do with the domain of the actual From-sender. Or any other domain, for that matter. Relaying emails is normal. Not having the rDNS set up properly shouldn’t be.

Rumor has it that most mail servers will accept the message even if there’s no match, or even if there’s no rDNS record at all. And I’ve seen plenty of these myself. I’ve also had my server rejected because of this. It’s losing points on being lazy.

EHLO is like HELO, but indicates the start of an ESMTP session. For the purpose of the domain, it’s the same thing.

MAIL FROM:

After the HELO introduction (and possibly some other stuff), the client goes something like:

MAIL FROM:<myself@example.com>

The email address given is often referred to as the envelope sender, envelope-from or smtp.mailfrom.

In its simplest form (and as originally intended), this is the sender of the mail, copied from the “From:” header, as presented to the end user. But even more important, this is the address for bouncing the mail if it’s undeliverable. So one common trick, mostly used by mass relays, is to assign a long and tangled MAIL FROM: bounce addresses from which the relaying server can identify the message better.

The envelope sender appears as the “Return-Path:” header in mail messages as they are reach mailing boxes. Along the Received list in the mail headers, “envelope-from” tags often appear, indicating the envelope sender of the relevant leg.

This way or another, if you’re into SPF, then the SPF record must match the envelope sender, and not necessarily the From: sender. Even though it’s a good idea to cover both. Mail relays are a bit messy on what they check.

VRFY and EXPN

VRFY allows the client to check whether an email address is valid or not on the server. If it is valid, the server responds with a full address of the user.

This allows the client to scan through a range of addresses, and find one that is a valid recipient. Excellent for spammers, which is why this function is commonly unavailable today. For example:

VRFY eli@billauer.co.il
252 Administrative prohibition

on another machine:

VRFY eli@billauer.co.il
252 2.5.2 Cannot VRFY user; try RCPT to attempt delivery (or try finger)

EXPN is more or less the same, just with mailing lists: The client gives the name of the list, and gets the list of users. The common practice is not allowing this command. Even not those who allow VRFY despite its issues with spam.

If you’re setting up a mail server, disable this. It’s often enabled by default.