stmmaceth: NetworkManager fails to bring up a wired Ethernet NIC

This post was written by eli on January 4, 2014
Posted Under: ARM,Intel FPGA (Altera),Linux,Linux kernel

The problem

In short: Running linux 3.8.0 on Altera’s Cyclone V SoC, NetworkManager doesn’t bring up the Ethernet port. It also makes false accusations such as

Jan  1 00:00:17 localhost NetworkManager[1206]: <info> (eth0): driver 'stmmaceth' does not support carrier detection.

and later on also says

Jan  1 00:00:17 localhost NetworkManager[1206]: <warn> (eth0): couldn't get carrier state: (-1) unknown
Jan  1 00:00:17 localhost NetworkManager[1206]: <info> (eth0): carrier now OFF (device state 20, deferring action for 4 seconds)

And asking more directly,

# nm-tool eth0
NetworkManager Tool

State: disconnected

- Device: eth0 -----------------------------------------------------------------
  Type:              Wired
  Driver:            stmmaceth
  State:             unavailable
  Default:           no
  HW Address:        96:A7:6F:4E:DD:6D

  Capabilities:

  Wired Properties
    Carrier:         off

All of this is, of course, incorrect. Even though it’s not clear who to blame for this. But the driver detects the carrier all right:

# cat /sys/class/net/eth0/carrier
1

and as we shall see below, the ioctl() interface is also supported. Only it doesn’t work as NetworkManager expects it to.

Well, I bluffed a bit proving that the carrier detection works. Explained later.

So what went wrong?

Nothing like digging in the source code. In NetworkManager’s nm-device-ethernet.c, the function supports_ethtool_carrier_detect() goes

static gboolean
supports_ethtool_carrier_detect (NMDeviceEthernet *self)
{
	int fd;
	struct ifreq ifr;
	gboolean supports_ethtool = FALSE;
	struct ethtool_cmd edata;

	g_return_val_if_fail (self != NULL, FALSE);

	fd = socket (PF_INET, SOCK_DGRAM, 0);
	if (fd < 0) {
		nm_log_err (LOGD_HW, "couldn't open control socket.");
		return FALSE;
	}

	memset (&ifr, 0, sizeof (struct ifreq));
	strncpy (ifr.ifr_name, nm_device_get_iface (NM_DEVICE (self)), IFNAMSIZ);

	edata.cmd = ETHTOOL_GLINK;
	ifr.ifr_data = (char *) &edata;

	errno = 0;
	if (ioctl (fd, SIOCETHTOOL, &ifr) < 0) {
		nm_log_dbg (LOGD_HW | LOGD_ETHER, "SIOCETHTOOL failed: %d", errno);
		goto out;
	}

	supports_ethtool = TRUE;

out:
	close (fd);
	nm_log_dbg (LOGD_HW | LOGD_ETHER, "ethtool %s supported",
	            supports_ethtool ? "is" : "not");
	return supports_ethtool;
}

Obviously, this is the function that determines if the port supplies carrier detection. There is also a similar function for MII, supports_mii_carrier_detect (). A simple strace reveals what went wrong:

And indeed, in the strace log with this driver it says

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 17
ioctl(17, SIOCETHTOOL, 0x7e93bcdc)      = -1 EBUSY (Device or resource busy)
close(17)                               = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 17
ioctl(17, SIOCGMIIPHY, 0x7e93bcfc)      = -1 EINVAL (Invalid argument)
close(17)                               = 0
open("/proc/sys/net/ipv6/conf/eth0/accept_ra", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/proc/sys/net/ipv6/conf/eth0/use_tempaddr", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
gettimeofday({4101, 753554}, NULL)      = 0
send(6, "<30>Jan  1 01:08:21 NetworkManager[1701]: <info> (eth0): driver 'stmmaceth' does not support carrier detection.", 111, MSG_NOSIGNAL) = 111

so we can see that the attempt made in supports_ethtool_carrier_detect() failed with an EBUSY, and the one made by supports_mii_carrier_detect () failed as well, with an EINVAL. In other words, the ethtool (which is loosely related to the ethtool utility) ioctl() interface was recognized, but the driver said the driver was busy (a silly return code, as we shall see later), and the MII ioctl() interface was rejected altogether.

Since NetworkManager doesn’t support carrier detection based on Sysfs, the final conclusion is that there is no carrier detection.

But why did the driver answer EBUSY in the first place?

Some kernel digging

The relevant Linux kernel is 3.8.0.

ioctl() calls to network devices are handled by the dev_ioctl() function in net/core/dev.c (not in drivers/, and it was later on moved to dev_ioctl.c) as follows:

	case SIOCETHTOOL:
		dev_load(net, ifr.ifr_name);
		rtnl_lock();
		ret = dev_ethtool(net, &ifr);
		rtnl_unlock();
		if (!ret) {
			if (colon)
				*colon = ':';
			if (copy_to_user(arg, &ifr,
					 sizeof(struct ifreq)))
				ret = -EFAULT;
		}
		return ret;

Note that the ioctl() call is based upon the name of the interface as a string (e.g. “eth0″). The call to dev_load hence loads a kernel module if the respective driver isn’t loaded yet. The dev_ethtool() function is in net/core/ethtool.c. This function first runs a few sanity checks + permissions, and may return ENODEV, EFAULT or EPERM, depending on different mishaps.

Most notably, it runs

	if (dev->ethtool_ops->begin) {
		rc = dev->ethtool_ops->begin(dev);
		if (rc  < 0)
			return rc;
	}

which in the case of stmmac is

static int stmmac_check_if_running(struct net_device *dev)
{
	if (!netif_running(dev))
		return -EBUSY;
	return 0;
}

netif_running(dev) is defined in include/linux/netdevice.h as follows:

static inline bool netif_running(const struct net_device *dev)
{
	return test_bit(__LINK_STATE_START, &dev->state);
}

This function returns true when the device is “up”, exactly in the sense of “ifconfig up”.

Say what?

NetworkManager made the SIOCETHTOOL ioctl() call before bringing up the eth0 interface in order to check if it supports carrier detect. But since it wasn’t up (why should it be? NetworkManager didn’t bring it up), the driver’s sanity check (?) failed the ioctl() call with an EBUSY, as netif_running() returns false — the interface was down. So NetworkManager marked the interface as not supporting carrier detect, and took it up even so. This made the driver say that it has detected a carrier, but since NetworkManager didn’t expect that to happen, it started fooling around, and eventually didn’t bring up the interface properly (no DHCP, in particular).

As it turns out, netif_running(dev) returns zero, which is the reason the whole thing fails with an EBUSY.

Now let’s return to the Sysfs detection of the carrier. With the eth0 interface down, it goes like this

# cat /sys/class/net/eth0/carrier
cat: /sys/class/net/eth0/carrier: Invalid argument
# ifconfig eth0 up
# cat /sys/class/net/eth0/carrier
0
# cat /sys/class/net/eth0/carrier
1

The two successive carrier detections give different results, because it takes a second or so before the carrier is detected. There was nothing changed with the hardware inbetween (no cable was plugged in or something).

So NetworkManager was partly right: There driver doesn’t support carrier detection as long as the interface isn’t brought up.

Solution

The solution is surprisingly simple. Just make sure

ifconfig eth0 up

is executed before NetworkManager is launched. That’s it. Suddenly nm-tool sees a completely different interface:

# nm-tool eth0

NetworkManager Tool

State: connected (global)

- Device: eth0  [Wired connection 1] -------------------------------------------
  Type:              Wired
  Driver:            stmmaceth
  State:             connected
  Default:           yes
  HW Address:        9E:37:A8:56:CF:EC

  Capabilities:
    Carrier Detect:  yes
    Speed:           100 Mb/s

  Wired Properties
    Carrier:         on

  IPv4 Settings:
    Address:         10.1.1.242
    Prefix:          24 (255.255.255.0)
    Gateway:         10.1.1.3

    DNS:             10.2.0.1
    DNS:             10.2.0.2

Who should we blame here? Probably NetworkManager. Since it’s bringing up the interface anyhow, why not ask it if it supports carrier detection after the interface is up? I suppose that the driver has its reasons for not cooperating while it’s down.

Epilogue

Since I started with dissecting the kernel’s code, here’s what happens with the call to dev_ethtool() mentioned above, when it passes the “sanity check”. There’s a huge case statement, with the relevant part saying

	case ETHTOOL_GLINK:
		rc = ethtool_get_link(dev, useraddr);
		break;

the rc value is propagated up when this call finishes (after some possible other operations, which are probably not relevant).

And then we have, in the same file,

static int ethtool_get_link(struct net_device *dev, char __user *useraddr)
{
	struct ethtool_value edata = { .cmd = ETHTOOL_GLINK };

	if (!dev->ethtool_ops->get_link)
		return -EOPNOTSUPP;

	edata.data = netif_running(dev) && dev->ethtool_ops->get_link(dev);

	if (copy_to_user(useraddr, &edata, sizeof(edata)))
		return -EFAULT;
	return 0;
}

The ethtool_value structure is defined in include/uapi/linux/ethtool.h saying

struct ethtool_value {
	__u32	cmd;
	__u32	data;
};

Note that if netif_running(dev) returns false, zero is returned on the edata entry of the answer, but the call is successful (it actually makes sense). But this never happens with the current driver, as was seen above.

It’s fairly safe to assume that drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c handles the actual call, as it has

static const struct ethtool_ops stmmac_ethtool_ops = {
	.begin = stmmac_check_if_running,
... snip ...
	.get_link = ethtool_op_get_link,
... snip ...
};

but ethtool_op_get_link() is defined in net/core/ethtool.c (we’re running in circles…) saying simply

u32 ethtool_op_get_link(struct net_device *dev)
{
	return netif_carrier_ok(dev) ? 1 : 0;
}

which bring us to include/linux/netdevice.h where it says

static inline bool netif_carrier_ok(const struct net_device *dev)
{
	return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
}

This rises the question why the driver refuses to answer ETHTOOL_GLINK requests when it’s down. It’s not even involved in answering this request. But having attempted to modify the driver, so ETHTOOL_GLINK is let through even when the interface is down, I can say that it still confused NetworkManager. I didn’t get down to why exactly.

Reader Comments

At least some NIC drivers will power down the NIC hardware when in “down” state, making it impossible to detect link state.

Written By Matti Kurkela on March 14th, 2016 @ 16:46

Add a Comment

Next Post: Pulseaudio for multiple users, without system-mode daemon

Previose Post: High resolution images of the Sockit board

my tech blog

Popular Posts

Latest Posts

Archives