Solved: systemd boot waits 90 seconds on net-devices-eth0

This post was written by eli on May 22, 2019
Posted Under: Linux,systemd,udev

Introduction

After installing wireshark (and tons of packages it depends on) on a rather fresh and bare-boned Debian 8 (Jessie), I got the “A start job is running for sys-subsystem-net-devices-eth0.device” message for a minute and half on every boot.

It was exceptionally difficult to find the reason, because so many packages were installed along with wireshark.

This is the short version of how this was solved. For the entire battery of stuff I tried out, I’ve written a separate post.

Bad omens

# systemctl status sys-subsystem-net-devices-eth0.device
● sys-subsystem-net-devices-eth0.device
   Loaded: loaded
   Active: inactive (dead)

May 20 12:47:07 diskless systemd[1]: Expecting device sys-subsystem-net-devices-eth0.device...
May 20 12:48:37 diskless systemd[1]: Job sys-subsystem-net-devices-eth0.device/start timed out.
May 20 12:48:37 diskless systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.

OK. Not surprising it’s not active. So start manually…?

# systemctl start sys-subsystem-net-devices-eth0.device
Job for sys-subsystem-net-devices-eth0.device timed out.

The second line appeared after a minute and a half, of course.

So I went to another, more recent machine (Mint 19) and went

$ systemctl status sys-subsystem-net-devices-eth0.device
● sys-subsystem-net-devices-eth0.device - Killer E2500 Gigabit Ethernet Controll
   Loaded: loaded
   Active: active (plugged) since Wed 2019-02-20 14:48:54 IST; 2 months 30 days
   Device: /sys/devices/pci0000:00/0000:00:1c.2/0000:04:00.0/net/eth0

And then comparing the outputs of just

$ systemctl

it became evident that *.device units are listed on the Mint 19 machine, but not on Debian 8.

Which led me to the conclusion that sys-subsystem-net-devices-eth0.device isn’t meant to be on Debian 8. That the problem isn’t that it’s not starting when commanded to do so, but that it’s not supposed to be started that way. The problem is that some other unit requests it.

As far as I understand, these .device units should become active and inactive by a systemd-udev event by virtue of udev labeling. They are there to trigger other units that depend on them, not to be controlled explicitly. For some reason they aren’t activated on the Debian 8 machine, despite udev rules being roughly the same as on the Mint 19 machine.

In the lack of proper docs (?), I’m left to guess that requesting a start or stop on device units means waiting for them to reach the desired state by themselves. This goes along with an observation I’ve made with strace, showing that systemd does nothing meaningful until it times out. So most likely, it just looked up the state of the device unit, saw it wasn’t started, and then went to sleep, essentially waiting for a udev event to bring the unit to the desired state, and consequently return a success status to the start request.

In fact, when I tried “systemctl stop” on the eth0 device on Mint 19 (i.e. the machine on which it was already loaded) it got stuck exactly the same way as for starting it on Debian 8. So that command probably meant “wait until eth0 goes away”.

Closing in

The trick is now to find which unit causes the attempt to kick off sys-subsystem-net-devices-eth0.device.

# journald -x

[ ... ]

May 20 11:41:20 diskless systemd[1]: Job sys-subsystem-net-devices-eth0.device/start timed out.
May 20 11:41:20 diskless systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.
-- Subject: Unit sys-subsystem-net-devices-eth0.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sys-subsystem-net-devices-eth0.device has failed.
--
-- The result is timeout.
May 20 11:41:20 diskless systemd[1]: Dependency failed for ifup for eth0.
-- Subject: Unit ifup@eth0.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ifup@eth0.service has failed.
--
-- The result is dependency.

In English: It’s clear that sys-subsystem-net-devices-eth0.device is the unit that didn’t manage to kick off. But the more important clue is that ifup@eth0.service failed, because it depends on the former. The easier solution lies in the latter.

But frankly, I don’t really understand what happened here. If eth0 was detected by systemd, why wasn’t the relevant device unit activated? Or if it wasn’t, why was ifup@eth0.service kicked off? The relevant unit file is a wildcard service, not naming any specific device name.

Solution

The textbook solution is to find why .device files aren’t generated at all on my Debian 8 system, fix that, and then there won’t be any delay. The correct solution in some cases is to manipulate the udev rules, adding a “TAG+=”systemd”" rule to the device, so the device unit is started automatically by systemd (man systemd.device). In my case this tag was already there, so it’s probably some issue with the service that’s supposed to respond to the udev event. So that’s a dead end.

So go the clumsy way: Remove the unit file that requests the device unit (or maybe I should have masked it by adding a file in /etc?). In this case, it’s /lib/systemd/system/ifup@.service, which said:

[Unit]
Description=ifup for %I
After=local-fs.target network-pre.target networking.service systemd-sysctl.service
Before=network.target
BindsTo=sys-subsystem-net-devices-%i.device
After=sys-subsystem-net-devices-%i.device
ConditionPathIsDirectory=/run/network
DefaultDependencies=no

[Service]
ExecStart=/sbin/ifup --allow=hotplug %I
ExecStop=/sbin/ifdown %I
RemainAfterExit=true

and then make sure this had no adverse side effects (none found so far). Actually, removing this file can’t be worse than it was when it took 90 seconds to boot, because this service wasn’t launched anyhow, as its precondition never started.

Add a Comment

required, use real name
required, will not be published
optional, your blog address