usbpiper: A single-threaded /dev/cuse and libusb-based endpoint to device file translator

This post was written by eli on February 28, 2020
Posted Under: Linux,Linux kernel,USB

Introduction

Based upon CUSE, libusb and the kernel’s epoll capability, this is a single-threaded utility which generates one /dev/usbpiper_* device file for each bulk / interrupt endpoint on a USB device. For example, /dev/usbpiper_bulk_in_01 and /dev/usbpiper_bulk_out_03.

It’s an unfinished project, that was stopped before a lot of obvious tasks in the TODO list were done. This is why several parameters are hardcoded and some memory allocations aren’t freed. Plus several other implications listed below.

It’s available at Github: https://github.com/billauer/usbpiper

I eventually went for a good old kernel driver instead. This post explains why, and you probably want to read it if you have plans on this utility or want to use FUSE or CUSE otherwise. That post also explains why I went right on to /dev/cuse rather than using libfuse.

Nevertheless, the project may very well be useful for development of USB projects, as a boilerplate or a getting-started utility. It also shows how to implement epoll-based asynchronous USB transfers, as well as implementing a CUSE-based device file driver in userspace, implementing the protocol of /dev/cuse directly (i.e. without relying on libfuse). And all this as a single thread program.

But what was the utility meant to do in the first place?

The underlying idea is simple: With a single-threaded userspace program, create a plain character device for each BULK (or INTERRUPT) endpoint that is found on a selected USB device, and allow data to be sent to each OUT endpoint by opening a device file, and just write data to it. With “cat” for example. And the other way around, read data from each IN endpoint by reading data from another device file. This description is simplistic, however it may work quite well when working on a USB device project. Just be sure to read the details below on setting up usbpiper. Doing that pretty much covers the necessary gory details.

What usbpiper definitely isn’t: It’s NOT a user-space driver for XillyUSB (a generic FPGA IP core for SuperSpeed USB 3.0, based upon the FPGA’s Gigabit transceivers). XillyUSB requires a dedicated driver, which implements a specific protocol with the IP core.

Confusing usbpiper with XillyUSB’s driver is easy, because both share the idea of plain device files for I/O with a USB device. In fact, usbpiper started off as a user-space driver for XillyUSB, but never got to the point of covering XillyUSB’s protocol.

Another possible source of confusion is usbfs. It’s a USB filesystem, so what is there to add? So yes, usbfs is used by libusb to allow a low-level driver for a USB device to be written in user space (usbpiper uses this interface, of course). It doesn’t allow a simple access to the data.

It’s recommended to look on this post on the protocol with /dev/cuse before diving into this one.

What works and in what ways it’s unfinished

usbpiper is executed with no arguments. It takes control of the selected USB device’s interface (which one — see below) and creates a /dev/usbpiper_* device file for each bulk or endpoint endpoint that it finds. The file’s name reflects the endpoint’s number, direction and bulk vs. interrupt.

It has however only been tested on bulk endpoints. Interrupt endpoints may work, but has not been tested, and isochronous endpoints are ignored. Also, usbpiper doesn’t free memory properly, in particular not buffers and other memory consuming stuff that are related to libusb.

Several parameters would normally be set through command-line parameters, but they are hardcoded.

The verbosity level can be set by editing some defines in usbpiper.h. In particular, a lot of messages are reduced by replacing

#define DEBUG(...) { fprintf(stderr, __VA_ARGS__); }

with

#define DEBUG(...)

In usbpiper.c, max_size defines the largest number of bytes that can be handled in a CUSE READ or WRITE request.

In usb.c, the following parameters are hardcoded:

  • FIFOSIZE: The effective number of bytes in the FIFO between the CUSE and USB units. The actual FIFO size for OUT endpoints is larger by max_size, for reasons explained in the “Basic data flow principle” section below.
  • vendorID and prodID define the device to be targeted. Note that the find_device() function in usb.c explicitly finds the device from the list of devices on the bus, so it can be altered to select the device based upon other criteria.
  • int_idx and alt_idx are the Interface and Alternate Setting indexes for selection on the device. More on this issue below.
  • td_bufsize is the size of the buffer that goes which each transfer. Set to 64 kiB, which is probably an overkill for most devices, but reasonable for proper bandwidth utilization with SuperSpeed devices. Also see below why it should be large when working with just some device.
  • numtd: The maximal number of outstanding transfers for each endpoint. A large number is good for high-bandwidth applications (with SuperSpeed) since it gives the hardware controller several transfers in a row before software intervention is required. Make it too big, and libusb_submit_transfer() may fail (the controller got more than it could accept).

Features that were meant to be added, but I never got there:

  • Array size of epoll should be dynamic (number of held file descriptors). Currently it’s ARRAYSIZE in usbpiper.c.
  • A file was supposed to be bidirectional. Makes no sense in this usage scenario, and bidirectional was never tested.
  • Non-blocking OPEN not supported
  • Was intended to support USB hotplugging
  • Adaption to XillyUSB’s protocol

USB Transfers and why you should care about them

There is a good reason why there isn’t any pipe-like plain device file interface for any USB device by default: usbpiper overlooks several details in the communication of a USB device.

The most important issue is that USB communication is divided into transfers, and are generally not treated as a continuous stream of data. The underlying model in the USB spec is that the host software initiates a transfer of a given number of bytes (in or out), the USB framework carries it out against the device, and then informs the software that it has been finished. The USB spec’s authors seem to have thought that the mainline usage of the USB bus would be done with a functional call saying something like “send this packet of data to the device”. Or another function saying “receive X bytes from the device”, which returns with a buffer pointing to the data buffer.

The USB framework supports asynchronous transfers, of course, but that doesn’t change the notion that the host’s software explicitly requests each transfer with a given number of bytes. All communication is cut into packet-like chunks of data with clear, boundaries. The device is allowed to divert from the host’s transfer requests only in one way: On IN endpoints, it’s allowed to terminate a transfer with less bytes than the host expected, and this is not considered an error.

However generally speaking, any software that communicates with a device directly (i.e. a device driver) is expected to know when the device expects transfers and of what size. usbpiper ignores this completely. Therefore, it may very well not work properly with just any device. This is less of an issue if the device is developed along with using usbpiper.

The three points to note are hence:

  • usbpiper sets byte count of OUT transfers according to the momentary buffer fill, up to a certain limit (td_bufsize). If the device expects a certain number of bytes in the transfer (which is legit) or the transfers are longer than in can take — things will break, of course. A device may also be sensitive to transfer boundaries, which usbpiper pays no attention to. If the device expects a fixed length for all transfers, this issue can be worked around by modifying try_queue_bulkout() never send a partially filled transfer, and set the desired length instead of td_bufsize.
  • usbpiper sets td_bufsize as the length of IN transfers, however the host doesn’t inform the device on how long the transfer is expected to be. The device driver is supposed to know the maximal length of an IN transfer that the device will respond with, and prepare a buffer long enough. Otherwise, a babbling error results (libusb returns LIBUSB_ERROR_OVERFLOW). td_bufsize is set to 64 kiB which is unlikely to be exceeded by USB devices — but this isn’t guaranteed.
  • Another issue with IN endpoints is that the information on where the boundaries of the transfers is lost: usbpiper just copies the data into a FIFO, which is read continuously on the other side. If the protocol of an IN endpoint relies on the driver knowing where a transfer started, usbpiper won’t be useful. This can be the case if the transfers are packets with a header, but without a data length field. This makes sense against a driver that receives the transfers directly.

Interfaces and alternate settings

A USB device may present several interfaces, and each interface may have alternate settings. This isn’t a gory technical detail, but can be the difference between getting your device working with usbpiper or not, in particular if it’s not something you designed yourself.

Even though a device is assigned an address on the USB bus, any USB driver claims the control of an interface of that device. In other words, it’s perfectly normal that several, possibly independent drivers control a single physical device. A keyboard / mouse combo device or a sound card with MIDI and joystick interface (not so common today). Or like a scanner / printer, which also acts as a card reader:

$ usb-devices
T:  Bus=01 Lev=03 Prnt=44 Port=03 Cnt=01 Dev#= 45 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=03f0 ProdID=7a11 Rev=01.00
S:  Manufacturer=HP
S:  Product=Photosmart B109a-m
S:  SerialNumber=MY5687428T02D2
C:  #Ifs= 4 Cfg#= 1 Atr=c0 MxPwr=2mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=cc Prot=00 Driver=(none)
I:  If#= 1 Alt= 0 #EPs= 2 Cls=07(print) Sub=01 Prot=02 Driver=usblp
I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
I:  If#= 3 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage

Note that the device effectively behaves like two independent devices: A scanner / printer and a USB disk.

It’s therefore important to not just set the Vendor / Product IDs correctly, but also the interface. usb-devices and lsusb -vv may help making the correct selection.

Alternate setting is less common, but a single interface may have different usage modes. If present, this must be set correctly as well.

Basic data flow principle

The purpose of the utility is to move data from a USB endpoint to a CUSE device file or vice versa. To accomplish this, there is a plain RAM FIFO allocated for each such data stream.

For an IN endpoint, the USB frontend queues asynchronous transfer requests using libusb. For each IN transfer that is finished, the data is copied into the relevant FIFO. On the FIFO’s other side, the read() calls on the device file (i.e. CUSE READ requests) are fulfilled, as necessary, by submitting data that is fetched from the FIFO. Overflow of the FIFO is prevented by queuing IN transfer requests only when there’s enough room in the FIFO to accept the data that all outstanding requests may carry, if they all return with a full buffer. Underflow is not an issue, but the read() call isn’t completed if there is no data to submit, in which case read() blocks.

For an OUT endpoint, a the handler of a write() call (i.e. CUSE WRITE requests) copies the data into the relevant FIFO. As a result of the FIFO containing data, the USB frontend may queue new OUT transfers with the data available — it may also not do so, in particular if the number of already outstanding transfer stands at the maximal available. The FIFO is protected from overflow by blocking the write() call until there is enough room in the FIFO. The exact condition relates to the fact the length of the data buffer of each CUSE WRITE request is limited by a number (max_size in the code) that is set during CUSE initialization. A WRITE request is hence not completed (hence preventing another one) until there is room for max_size additional bytes in the FIFO, after writing the current request’s data to the FIFO. This ensures that the usbpiper process always has where to put the data, and doesn’t need to block — which it’s now allowed to, being a single-threaded utility.

The requirement of always having max_size bytes of data vacant in the FIFO gets slightly trickier when a WRITE request is interrupted (i.e. receives an INTERRUPT request on its behalf). This forces usbpiper to immediately complete the request. In order to ensure the requirement on the FIFO, usbpiper possibly unwinds the FIFO, throwing away data so that the FIFO’s write fill is at most max_size bytes below full. This doesn’t break the data stream’s integrity or continuity, because the write() call returns with the number of bytes actually written (or an -EINTR, if none). If the FIFO was unwound, the number of bytes that were discarded is reduced from write()’s return value, giving the caller of write() the correct picture of how much data was consumed.

Execution flow

Recall from above that usbpiper doesn’t rely on libfuse, but rather communicates with the CUSE framework directly through /dev/cuse.

As the utility’s single thread needs to divide attention between the USB tasks and those related to CUSE, a single epoll() file descriptor is allocated for all open /dev/cuse files as well as those supplied by the libusb framework. A epoll_wait() event loop is implemented in usbpiper.c: Each entry in the epoll_event array contains a pointer a small structure, which contains a function to call and a pointer to a private data pass it to the function.

The communication protocol with /dev/cuse is discussed on another post. For the purpose of the current topic, the CUSE kernel framework creates a device file in /dev/ as a result of each time /dev/cuse being opened and a simple read-write handshake completed. After this, for each operation on the related device file (e.g. open(), read(), write() etc) a request packet is passed to the server (i.e. usbpiper in this case) by virtue of read() calls to the /dev/cuse file handle. The operation blocks until the server responds by writing a buffer to the same file handle, which contains a status header and possibly data. Responses to requests are not necessarily written in the same order as the requests. A unique ID number in the said status header ensures the pairing between requests and their responses.

read() calls from /dev/cuse block when there’s nothing to do, and are therefore subject to epoll in usbpiper. write() calls never block.

However this is not enough: For example, an epoll entry may indicate a new WRITE request on a CUSE file descriptor, which fills one of the FIFOs with data. As a result, there might be a new opportunity to queue new USB transfers. There are many software design approaches for how to make one action trigger others — the one taken in usbpiper is the simplest and messiest: Letting the performer of the action call the functions that may benefit from the opportunity directly. In the given example, this means that process_write() calls try_queue_bulkout() directly. The latter calls try_complete_write() in turn.

The function nomenclature in this utility is consistent in that several functions have a try_*() prefix to mark that they are opportunity oriented. It would have been equally functional, cleaner and more elegant (however less efficient) to call all try_*() functions on behalf of all endpoints and device files. Or alternatively, maintain some queue of try_*() function calls, however this wouldn’t take away the need for awareness of which actions may open what opportunity.

Delays and timeouts

There are a couple of situations where a timer is required. A timerfd is allocated for each device file, serving the following two scenarios:

  • Related to IN endpoints: When a READ request can’t be completed with the full number of bytes that are required, usbpiper waits up to 10 ms for data from the IN endpoint to fill the relevant FIFO. After this timeout, try_complete_read() completes the request as soon as there is any data in the FIFO. The rationale is to avoid a flood of READ request and responses if the data arrives frequently and in small chunks.
  • Related to OUT endpoints: When a RELEASE request arrives, and there is still data in the relevant FIFO, try_complete_release() waits up to 1000 ms for the FIFO to drain by the OUT endpoint. After this, try_complete_release() completes the request, hence closing the related device file (not /dev/cuse) after emptying the FIFO.

A single timer can be used for both tasks, because a RELEASE can’t occur before all outstanding requests have been completed on the related device file (Linux’ device file API ensures that). Besides, each device file can be related only to either an IN or OUT endpoint, so once again, the timer won’t be necessary for both uses at the same time.

A similar 10 ms timeout could have been implemented for OUT endpoints, i.e. generate an OUT transfer only if the FIFO contains enough data for a full transfer buffer. This wouldn’t require another timer, for the first reason given above. However this possibility was dropped in favor of another mechanism for preventing unnecessary I/O: try_queue_bulkout() submits a transfer with less than a full buffer only if there is no other outstanding transfer on the same endpoint. The reason for opting out the 10 ms timer for this purpose has to do with the original purpose of this usbpiper, as a driver for XillyUSB (which didn’t materialize).

Add a Comment

required, use real name
required, will not be published
optional, your blog address