Linux CUSE (and FUSE): Why I ditched two months of work with it

This post was written by eli on February 28, 2020
Posted Under: Linux,Linux kernel

Introduction

If you’re planning to use CUSE (or FUSE) for an application you care about, this post is for you. That includes future self. I’m summarizing my not-so-pleasant journey with this framework here, with focus on how I gradually realized that I should start from the scratch with an old-school kernel module instead.

Most important, if you run CUSE on a v5.0 to v5.3 Linux kernel, you’re in for an imminent OOPS that requires an immediate reboot of the computer. This was the final straw for me (more like a huge log). Even if the user-space driver detected the kernel version and refused to run on kernels that would crash, that would mean it wouldn’t run on the most common distributions at the time of release. And I asked if I want to depend on a subsystem that is maintained this way.

Maybe I should have listened to what Linus had to say about FUSE back in 2011:

People who think that userspace filesystems are realistic for anything but toys are just misguided.

Unfortunately, it seems like the overall attiude towards FUSE is more or less in that spirit, hence nobody gets alarmed when the relevant code gets messier than is usually allowed: FUSE is nice for that nifty GUI that allows me to copy some files from my smartphone to the computer over a USB cable. It fails when there are many files, but what did I expect. Maybe it’s a problem with FUSE, maybe with the MTP/PTP protocol, but the real problem is that it’s treated as a toy.

As for myself, I was tempted to offer a user-space device driver for a USB product I’ve designed. A simple installation, possibly from binaries, running on virtually any computer. CUSE is around for many years, and opens a file in /dev with my name of choice. It makes the device file behave as if it was backed by a driver in the kernel (more or less). What could possibly go wrong?

And a final note before the storytelling: This post was written in the beginning of 2020. Sometimes things change after a while. Not that they usually do, but who knows?

Phase I: Why I opted out libfuse

The natural and immediate choice for working with FUSE is to use its ubiquitous library, libfuse. OK, how does the API go? How does it work?

libfuse’s git commits date back to 2001, and the project is alive by all means, with several commits and version updates every month. As for documentation, the doc/ subdirectory doesn’t help much, and its mainpage.dox says it straight out:

The authoritative source of information about libfuse internals (including the protocol used for communication with the FUSE kernel module) is the source code.

Simply put, nothing is really documented, read the source and figure it out yourself. There’s also an example/ directory with example code, showing how to get it done. Including a couple of examples for CUSE. But no API at all. Nothing on the fine details that make the difference between “look it works, oops, now it doesn’t” and something you can rely upon.

As for the self-documenting code, it isn’t a very pleasant experience, as it’s clearly written in “hack now, clean up later (that is, never)” style.

There are however scattered pieces of documentation, for example:

So with the notion that messy code is likely to bite back, I decided to skip libfuse and talk with /dev/cuse directly. I mean, kernel code can’t be that messy, can it?

It took me quite some time to reverse-engineer the CUSE protocol, and I’ve written a couple of posts on this matter: This and this.

Phase II: Accessing /dev/cuse causing a major OOPS

After nearly finishing my CUSE-based (plus libusb and epoll) driver on a Linux v4.15 machine , I gave it a test run on a different computer, running kernel v5.3. And that went boooom.

Namely, just when trying to close /dev/cuse, an OOPS message as follows appeared, leaving Linux limping, requiring an immediate reboot:

kernel: BUG: spinlock bad magic on CPU#0, cat/951
kernel: general protection fault: 0000 [#1] PREEMPT SMP PTI
kernel: CPU: 0 PID: 951 Comm: cat Tainted: G           O      5.3.0-USBTEST1 #1
kernel: RIP: 0010:spin_bug+0x6a/0x96
kernel: Code: 04 00 00 48 8d 88 88 06 00 00 48 c7 c7 90 ef d5 81 e8 8c af 00 00 41 83 c8 ff 48 85 db 44 8b 4d 08 48 c7 c1 85 ab d9 81 74 0e <44> 8b 83 c8 04 00 00 48 8d 8b 88 06 00 00 8b 55 04 48 89 ee 48 c7
kernel: RSP: 0018:ffffc900008abe18 EFLAGS: 00010202
kernel: RAX: 0000000000000029 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81d9ab85
kernel: RDX: 0000000000000000 RSI: ffff88816da16478 RDI: 00000000ffffffff
kernel: RBP: ffff88815a109248 R08: 00000000ffffffff R09: 000000006b6b6b6b
kernel: R10: ffff888159b58c50 R11: ffffffff81c5cd00 R12: ffff88816ae00010
kernel: R13: ffff88816a165e78 R14: 0000000000000012 R15: 0000000000008000
kernel: FS:  00007ff8be539700(0000) GS:ffff88816da00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007ffe4fa2fee8 CR3: 000000016b5d0002 CR4: 00000000003606f0
kernel: Call Trace:
kernel: do_raw_spin_lock+0x19/0x84
kernel: fuse_prepare_release+0x3b/0xe7 [fuse]
kernel: fuse_sync_release+0x37/0x49 [fuse]
kernel: cuse_release+0x16/0x22 [cuse]
kernel: __fput+0xf0/0x1c2
kernel: task_work_run+0x73/0x86
kernel: exit_to_usermode_loop+0x4e/0x92
kernel: do_syscall_64+0xc9/0xf4
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9

OK, so why did this happen?

To make a long story short, because a change was made in the FUSE kernel code without testing it on CUSE. I mean, no test at all.

The gory details: A spinlock was added to struct fuse_inode, but someone forgot that the CUSE doesn’t have such struct related to it, because it’s on /dev and not on a FUSE mounted filesystem. Small mistake, no test, big oops.

Even more gory details: Linux kernel commit f15ecfef058d94d03bdb35dcdfda041b3de9d543 adds a spinlock check in fuse_prepare_release() (among others), saying

	if (likely(fi)) {
		spin_lock(&fi->lock);
		list_del(&ff->write_entry);
		spin_unlock(&fi->lock);
	}

For this to even be possible, an earlier commt (ebf84d0c7220c7c9b904c405e61175d2a50cfb39) adds a struct fuse_inode *fi argument to fuse_prepare_release(), and also makes sure that it’s populated correctly. In particular, in cuse.c, it goes:

struct fuse_inode *fi = get_fuse_inode(inode);

(what would I do without git blame?).

But wait. What inode? Apparently, the idea was to get the inode’s struct fuse_inode, which is allocated and initialized by fuse_alloc_inode() in fs/fuse/inode.c. However this function is called only as a superblock operation — in other words, when the kernel wants to create a new inode on a mounted FUSE filesystem. A CUSE device file doesn’t have such entry allocated at all!

get_fuse_inode() is just a container_of(). In other words, it assumes that @inode is an entry inside a struct fuse_inode, and returns the address of the struct. But it’s not. In the CUSE case, the inode belongs to a devfs or something. get_fuse_inode() returns just some random address, and no wonder do_raw_spin_lock() whines that it’s called on something that isn’t a spinlock at all.

The relevant patches were submitted by Kirill Tkhai and committed by Miklos Szeredi. None of whom made the simplest go-no-go test on CUSE after this change, of course, or they would have spotted the problem right away. What damage could a simple change in cuse.c make, after all?

The patch that fixed it

This issue is fixed in kernel commit 56d250ef9650edce600c96e2f918b9b9bafda85e (effective in kernel v5.4) by Miklos Szeredi, saying “It’s a small wonder it didn’t blow up until now”. Irony at its best. He should have written “the FUSE didn’t blow”.

So this bug lived from v5.0 to v5.3 (inclusive), something like 8 months. 8 months without a single minimal regression test by the maintainer or anyone else.

The patch removes the get_fuse_inode() call in cuse.c, and calls fuse_prepare_release() with a NULL instead. Meaning there is no inode, like it should.

Add a Comment

required, use real name
required, will not be published
optional, your blog address