Setting up a small Sphinx project for validating Linux kernel documentation RST markup


Since I maintain a module in the Linux kernel, I also need to maintain its documentation. Sometime in the past, the rst format was adopted for files under Documentation/ in the kernel tree, with Sphinx chosen as the tool for making formatted documents. Which is pretty nice, as rst human readable and can also be used to produce HTML, PDF and other file formats.

But it means that when I make changes in the doc, I also need to check that it compiles correctly, and produces the desired result with Sphinx.

The idea is to edit and compile the documentation file in a separate directory, and then copy back the updated file into the kernel tree. Partly because trying to build the docs inside the kernel requires installing a lot of stuff, and odds are that I’ll be required to upgrade the tools continuously with time (in fact, it complained about my existing version of Sphinx already).

The less favorable side is that this format was originally developed for documenting Python code, and Sphinx itself is written in Python. Which essentially means that Python’s law, sorry, Murphy’s law applies to getting thing done: If there’s the smallest reason for the toolchain to fail, it will. I don’t know if it’s the Python language itself or the culture around it, but somehow when Python is in the picture, I know I’m going to have a bad day.

I’m running Linux Mint 19.


# apt install python3-sphinx

A lot of packages are installed along with that, but fine.

Setting up

Copy the related .rst file into an empty directory, navigate to it, and go

$ sphinx-quickstart

This plain and interactive utility asks a lot of questions. The defaults are fine. It insists on giving the project a name, as well as stating the name of the author. Doesn’t matter too much.

It generates several files and directories, but for version control purposes, only these need to be saved: Makefile, and index.rst. The rest can be deleted, which will cause the tools to issue warnings on HTML builds. But nevertheless get the job done.

One warning can be silenced by adding a _static directory (the warning says it’s missing, so add it).

$ mkdir _static

This directory remains empty however. Why whine over a directory that isn’t used? Python culture.

Another thing Sphinx complains about is

WARNING: document isn't included in any toctree

That is fixed by adding a line in index.rst, with the name of the related .rst file, without the .rst suffix. Refer to index.rst files under the kernel’s Documentation subdirectory for examples.


Once the setup is done and over with create an HTML file from the .rst file with

$ make html

If the defaults were accepted during the setup, the HTML file can be found in _build/html. Plus a whole lot of other stuff.

And then there’s of course

$ make clean

which works as one would expect.

Running Xilinx Impact on Linux Mint 19


This is my short war story as I made Xilinx’ Impact, part of ISE 14.7, work on a Linux Mint 19 machine with a v4.15 Linux kernel. I should mention that I already use Vivado on the same machine, so the whole JTAG programming thing was already sorted out, including loading firmware into the USB JTAG adapters, whether it’s a platform cable or an on-board interface. All that was already history. It was Impact that refused to play ball.

In short, what needed to be done:

  • Make a symbolic link to activate libusb.
  • Make sure that the firmware files are installed, even if they’re not necessary to load the USB devices.
  • Make sure Vivado’s hardware manager isn’t running.
  • Don’t expect it to always work. It’s JTAG through USB, which is well-known for being cursed since ancient times. Sometimes “Initialize Chain” works right away, sometimes “Cable Auto Connect” is needed to warm it up, and sometimes just restart Impact, unplug and replug everything + recycle power on relevant card. Also apply spider leg powder as necessary with grounded elephant eyeball extract.

And now in painstaking detail.

The USB interface

The initial attempt to talk with the USB JTAG interface failed with a lot of dialog boxes saying something about windrvr6 and this:

PROGRESS_START - Starting Operation.
If you are using the Platform Cable USB, please refer to the USB Cable Installation Guide (UG344) to install the libusb package.
Connecting to cable (Usb Port - USB21).
Checking cable driver.
 Linux release = 4.15.0-20-generic.
WARNING:iMPACT -  Module windrvr6 is not loaded. Please reinstall the cable drivers. See Answer Record 22648.
Cable connection failed.

This is horribly misleading. windrvr6 is a Jungo driver which isn’t supported for anything by ancient kernels. Also, the said Answer Record seems to have been deleted.

Luckily, there’s a libusb interface as well, but it needs to be enabled. More precisely, Impact needs to find a file somewhere. Even more precisely, this is some strace output related to its attempts:

openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE//lib/lin64/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE/lib/lin64/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE/sysgen/lib/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/EDK/lib/lin64/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/common/lib/lin64/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[ ... ]
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

It so happens that a libusb module is present among the files installed along with ISE (several times, actually), so it’s enough to just

$ cd /opt/xilinx/14.7/ISE_DS/ISE/lib/lin64/
$ ln -s

or alternatively, a symlink to /usr/lib/x86_64-linux-gnu/ worked equivalently well on my system.

But then

Trying to initialize the chain I got:

PROGRESS_START - Starting Operation.
Connecting to cable (Usb Port - USB21).
Checking cable driver.
File version of /opt/xilinx/14.7/ISE_DS/ISE/bin/lin64/xusbdfwu.hex = 1030.
 Using libusb.
Please run `source ./setup_pcusb` from the /opt/xilinx/14.7/ISE_DS/ISE//bin/lin64 directory with root privilege to update the firmware. Disconnect and then reconnect the cable from the USB port to complete the driver update.
Cable connection failed.

So yey, it was not going for libusb. But then it refused to go on.

Frankly speaking, I’m not so much into running any script with root privileges, knowing it can mess up things with the working Vivado installation. On my system, there was actually no need, because I had already installed and then removed the cable drivers (as required by ISE).

What happened here was that Impact looked for firmware files somewhere in /etc/hotplug/usb/, assuming that if they didn’t exist, then the USB device must not be loaded with firmware. But it was in my case. And yet, Impact refused on the grounds that the files couldn’t be found.

So I put those files back in place, and Impact was happy again. If you don’t have these files, an ISE Lab Tools installation should do the trick. Note that it also installs udev rules, which is what I wanted to avoid. And also that the installation will fail, because it includes compiling the Jungo driver against the kernel, and there’s some issue with that. But as far as I recall, the kernel thing is attempted last, so the firmware files will be in place. I think.

Or installing them on behalf of Vivado is also fine? Note sure.

Identify failed

Attempting to Cable Auto Connect, I got Identify Failed and a whole range of weird errors. Since I ran Impact from a console, I got stuff like this on the terminal:

ERROR set configuration. strerr=Device or resource busy.
ERROR claiming interface.
ERROR setting interface.
ERROR claiming interface in bulk transfer.
bulk tranfer failed, endpoint=02.
ERROR releasing interface in bulk transfer.
ERROR set configuration. strerr=Device or resource busy.
ERROR claiming interface.
ERROR setting interface.
control tranfer failed.
control tranfer failed.

This time it was a stupid mistake: Vivado’s hardware manager ran at the same time, so the two competed. Device or resource busy or not?

So I just turned off Vivado. And voila. All ran just nicely.

Bonus: Firmware loading confusion

I mentioned that I already had the firmware loading properly set up. So it looked like this in the logs:

Feb 13 11:58:18 kernel: usb 1-5.1.1: new high-speed USB device number 78 using xhci_hcd
Feb 13 11:58:18 kernel: usb 1-5.1.1: New USB device found, idVendor=03fd, idProduct=000d
Feb 13 11:58:18 kernel: usb 1-5.1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Feb 13 11:58:18 systemd-udevd[59619]: Process '/alt-root/sbin/fxload -t fx2 -I /alt-root/etc/hotplug/usb/xusbdfwu.fw/xusb_emb.hex -D ' failed with exit code 255.

immediately followed by:

Feb 13 11:58:25 kernel: usb 1-5.1.1: new high-speed USB device number 80 using xhci_hcd
Feb 13 11:58:25 kernel: usb 1-5.1.1: New USB device found, idVendor=03fd, idProduct=0008
Feb 13 11:58:25 kernel: usb 1-5.1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 13 11:58:25 kernel: usb 1-5.1.1: Product: XILINX
Feb 13 11:58:25 kernel: usb 1-5.1.1: Manufacturer: XILINX

This log contains contradicting messages. On one hand, the device is clearly re-enumerated with a new product ID, indicating that the firmware load went fine. On the other hand, there’s an error message saying fxload failed.

I messed around quite a bit with udev because of this. The problem is that the argument to the -D flag should be the path to the device files of the USB device, and there’s nothing there. In the related udev rule, it says $devnode, which should substitute to exactly that. Why doesn’t it work?

The answer is that it actually does work. For some unclear reason, the relevant udev rule is called a second time, and on that second time $devnode is substituted with nothing. Which is harmless because it fails royally with no device file to poke. Except for that confusing error message.

systemd: Shut down computer at a certain uptime


Paid-per-time cloud services. I don’t want to forget one of those running, just to get a fat bill at the end of the month. And if the intended use is short sessions anyhow, make sure that the machine shuts down by itself after a given amount of time. Just make sure that a shutdown by the machine itself accounts for cutting the costs. And sane cloud provider does that except for, possibly, costs for storing the VM’s disk image.

So this is the cloud computing parallel to “did I lock the door?”.

The examples here are based upon systemd 241 on Debian GNU/Linux 10.

The main service

There is more than one way to do this. I went for two services: One that calls /sbin/shutdown with a five minute delay (so I get a chance to cancel it) and then second is a timer for the uptime limit.

So the main service is this file as /etc/systemd/system/uptime-limiter.service:

Description=Limit uptime service

ExecStart=/sbin/shutdown -h +5 "System it taken down by uptime-limit.service"


The naïve approach is to just enable the service and expect it to work. Well, it does work when started manually, but when this service starts as part of the system bringup, the shutdown request is registered but later ignored. Most likely because systemd somehow cancels pending shutdown requests when it reaches the ultimate target.

I should mention that adding in the unit file didn’t help. Maybe some other target. Don’t know.

The timer service

So the way to ensure that the shutdown command is respected is to trigger it off with a timer service.

The timer service as /etc/systemd/system/uptime-limiter.timer, in this case allows for 6 hours of uptime (plus the extra 5 minutes given by the main service):

Description=Timer for Limit uptime service



and enable it:

# systemctl enable uptime-limiter.timer
Created symlink /etc/systemd/system/ → /etc/systemd/system/uptime-limiter.timer.

Note two things here: That I enabled the timer, not the service itself, by adding the .timer suffix. And I didn’t start it. For that, there’s the –now flag.

So there are two steps: When the timer fires off, the call to /sbin/shutdown takes place, and that causes nagging wall messages to start once a minute, and eventually a shutdown. Mission complete.

What timers are pending

Ah, that’s surprisingly easy:

# systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sun 2021-01-31 17:38:28 UTC  14min left    n/a                          n/a          systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2021-01-31 20:50:22 UTC  3h 26min left Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily.timer              apt-daily.service
Sun 2021-01-31 23:23:28 UTC  5h 59min left n/a                          n/a          uptime-limiter.timer         uptime-limiter.service
Sun 2021-01-31 23:23:34 UTC  5h 59min left Sun 2021-01-31 17:23:34 UTC  44s ago      google-oslogin-cache.timer   google-oslogin-cache.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago logrotate.timer              logrotate.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago man-db.timer                 man-db.service
Mon 2021-02-01 06:49:19 UTC  13h left      Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily-upgrade.timer      apt-daily-upgrade.service

Clean and simple. And this is probably why this method is better than a long delay on shutdown, which is less clear about what it’s about to do, as shown next.

Note that a timer service can be stopped, which is parallel to canceling a shutdown. Restarting it to push the time limit further won’t work in this case, because the service is written related to OnBootSec.

Is there a shutdown pending?

To check if a shutdown is about to happen:

$ cat /run/systemd/shutdown/scheduled
WALL_MESSAGE=System it taken down by uptime-limit.service

There are different reports on what happens when the shutdown is canceled. On my system, the file was deleted in response to “shutdown -c”, but not when the shutdown was canceled because the system had just booted up. There’s other suggested ways too, but in the end, it appears like there’s no definite way to tell if a system has a shutdown scheduled or not. At least not as of systemd 241.

That USEC line is the epoch time for when shutdown will take place. A Perl guy like me goes

$ perl -e 'print scalar gmtime(1612103418427661/1e6)'

but that’s me.

What didn’t work

So this shows what doesn’t work: Enable the main service (as well as start it right away with the –now flag):

# systemctl enable --now uptime-limiter
Created symlink /etc/systemd/system/ → /etc/systemd/system/uptime-limiter.service.

Broadcast message from root@instance-1 (Sun 2021-01-31 14:15:19 UTC):

System it taken down by uptime-limit.service
The system is going down for poweroff at Sun 2021-01-31 14:25:19 UTC!

So the broadcast message is out there right away. But this is misleading: It won’t work at all when the service is started automatically during system boot.

Some notes on how an air conditioner works

I’m not an AC guy. Not at all.

But I do own an Tadiran Wave INV 40/3 installed 2016, and then I had some issues with it. In hindsight, there was some real kind of problem a few years ago, and the rest of the problems were because the AC guy that came to fix the original problem replaced a 100k thermistor with a 5ok one. So the microcontroller detected something was wrong every now and then, and stopped the show (preferably on a hot summer day). He used to come, change the thermistor again, and the machine worked nicely until the next time. Why this helped on the first occasion and on those that followed is beyond me. Eventually I acquired and replaced the thermistor myself, with rated value given in the installation manual, and I hope that’s the end of it.

By the way, a 50k thermistor instead of a 100k thermistor makes the controller think the temperature is 16°C higher than it actually is. So it’s not wrong enough to make prevent the machine from kicking off, but bad enough to dislike it enough to stop after a few months. Or a year.

I have no training in the field of air conditioning or anything of that field, and my memories from the Thermodynamics course I took (and somehow passed with a fair grade) is that the world is turning into a mess. Being a human, I’m always at the risk of writing rubbish, but on this post I feel pretty free to do so. If I got something wrong, kindly correct me below. If you blew your compressor because anything I wrote below, blame yourself for taking advice from someone who has no clue.

But the story with the thermistor is yet another piece of evidence that if something is important, better be involved in the gory details. Not necessarily those written below.

The jots

  • The name of the game is the refrigerant’s boiling temperature, which changes with the pressure. The compressor pushes coolant towards the condenser’s coil (“radiator”), which travels downwards and cools down there, turning into liquid somewhere in the middle of it. As the refrigerant cools down all the way through the coil, it reaches the end of the coil at a temperature that is lower than the boiling point (but the pressure remains more or less uniform). The difference between the measured temperature at the boiling point calculated from the pressure is called subcooling, and gives an indication of the level of the refrigerant inside the coil. That is, where in the coil it turned from gas to fluid.
  • By the same coin, inside the evaporator, the fluid refrigerant (actually, mixed with some gas due to spark evaporation at the metering device) travels upwards and heats up by the inside air it cools. Somewhere in the middle the fluid reaches its boiling point, and continues as gas. Once again, the difference between the measured temperature at the end of the coil and the boiling point indicated by the pressure is called superheat, and indicates where in the coil this took place. This too gives an indication of the refrigerant fill.
  • The measurement of temperature and pressure is typically done at the outdoor unit’s connection to the pipes. There might be differences in pressure and temperature from those actually relevant at the evaporator coil, but this is good enough for figuring out the overall situation.
  • The cooling gas is R410A. According to the installer’s guide for my AC (page 34), the equilibrium temperature is 4°C with pressure 116 PSIG on the evaporator, and 50°C with 429 PSIG at the condenser (AC unit working as cooler). These figures are possible to derive, because there are typical temperatures of the refrigerant at the end of both coils: The environment’s temperature at the end of condenser’s coil, and some 15°C at the evaporator’s (?). This can however be deadly wrong if either coil (or both) don’t function properly, most likely due to poor ventilation (dirty filters, obstructions, poor ventilation path indoors etc.).
  • The transition from high pressure to low occurs at the metering device, which is basically a narrow pipe. TXV / TEV valves are commonly in use. In a typical in/outdoor unit setting, the metering device is indoors, next to the coil. This means that the refrigerant flows in the pipe towards the indoor unit as a compressed fluid — this is referred to as the “liquid line” and is therefore the narrower and hotter pipe. Narrower, because the flow is slower and the refrigerant is expensive. In the other direction, the refrigerant flows as a gas, in the wider, cooler pipe. It’s a bit confusing that the hotter pipe is the one containing liquid, but it’s all a matter of pressures.
  • Note that in both coils, the higher part is hotter: In the condenser because hot refrigerant is pushed from above, and it cools down as it travels downwards. In the evaporator, the fluid arrives from below, and travels upwards as it heats up. Consequently, in both coils, a physically higher transition point between fluid and gas inside the coil means a higher pressure — because the temperature there is higher. This is how the fluid levels even out between the condenser and evaporator: In steady state, the compressor generates a constant pressure difference and flow on the gas side, which works against the metering valve and differences in altitude. By conservation of mass, there must be a certain amount of fluid in the system to accommodate the existing amount of refrigerant (to complete the amount that is in gas form). If the fluid level rises on one coil, it has to drop on the other. But since a rising fluid level also means a rise in pressure, it causes more pressure against the metering device as well as the compressor. So a rise in the fluid level makes the coil push harder fluid downwards, and gas upwards making refrigerant to travel more towards the other coil, and the fluid level to drop on the current one.
  • Another way to look at it: The pressure difference imposed by the compressor dictates a difference between the temperatures of the liquid-to-fluid transition points of both coils. Well more or less, because the pressure to boiling point relation isn’t linear. So given the amount of refrigerant in the system, the fluid is distributed in the segment from the condenser’s coils bottom, through the liquid line and the metering device to the evaporator coil’s bottom. The levels of the fluids stabilize so as to satisfy the temperature difference imposed by the compressor.
  • However if there is too little refrigerant (a leak…), then this temperature difference relation can’t be satisfied. The result, say people out there, is bubble noise in the pipes and accumulation of ice at the outdoor unit’s wider pipe.
  • The metering valve is designed to create a pressure gap with fluids, so it the pressure difference is smaller when gas runs through it. The compressor drops the pressure in the (wider) gas pipe, causing the temperature to drop well below zero, and ice accumulates.
  • The viscosity of R410A is more than ten times larger as a fluid than as a gas at 10°C, so if it reaches the metering device as a gas, it flows through much easier. Hence if there isn’t enough refrigerant in the system — that is, the segment from the condeser coil’s bottom to the metering device isn’t filled with continuous liquid — gas will travel quickly through the metering device, causing a pressure drop in the liquid line, which even more gas to evaporate in the liquid line (creating bubbles that are often audible).
  • Note however that accumulation of ice could be just the result of poor ventilation of the indoor unit, for example due to dust in the filters: If the gas isn’t warmed up enough in the indoor unit, its overall temperature cycle drops, and it reaches the outdoor unit very cool.
  • How do you tell? Well, air condition guys ask about those filters every time for a reason. In addition, inverter AC systems have temperature sensors before, in the middle of and after the coil in both indoor and outdoor units. This way, the microcontroller can tell what the fluid level is on both sides. If it’s too low, odds are that it will correctly detect that situation and issue an error code on the user panel, and more accurately with a blinking LED on the controller PCB, either the one indoor or outdoor. It’s a good idea to download the installer’s manual for the specific AC model, if you’re brave enough to open your AC on a hot summer’s day.
  • “Heat pump” is just the same of an AC that can also do heating.
  • In English, HVAC is a good searching keyword for air conditioning.


Apache 2.4: RewriteRule with [NE] causing 500 Internal Server Error

This is the weirdest thing. With an Apache 2.4.10 on Linux Debian 8 (yes, old), and a relatively simple mod_rewrite rule in .htaccess going

RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*)$1 [R=301,L,NE]

This is really nothing special. Just pushing users to the www host name, if they were lazy typing it.

This works almost perfectly, but then I have this thing about trying odd stuff. So I get an internal server error when I request the page, but not and not In other words, I get a server error when there’s a legal percent-escape of a character in the %01-%1f range. Not with %00, and not with %20 and up, and not when there’s just one digit. It’s really a matter of a proper percent escape.

This %-escape doesn’t have to be in the beginning of the string. is equally offensive. But it appears like a “?” neutralizes the problem — there is no server error if a “?” appeared before the offending %-escape. Recall that characters after a “?” are normally not escaped.

And of course, there is no problem accessing (getting a 404, but fine), as the rule doesn’t come to effect.

The internal server error leaves no traces in neither the error log nor the access log. So there’s no hint there. Neither in the journalctl log. No process dies either.

The problem is clearly the NE flag (marked red above), telling mod_rewrite not to escape the the string it rewrites. Why this is an issue is beyond me. I could try asking in forums and such, but I have hunch on the chances getting an answer.

Exactly the same behavior is observed when this rule is in an Apache configuration file (in <Virtualhost> context).

So I just dropped the NE flag. Actually, this is a note to self explaining why I did it.

One could argue that it’s pretty harmless to get an error on really weird URL requests, however often a serious vulnerability begins with something that just went a little wrong. So I prefer to stay away.

If anyone has insights on this matter, please comment below. Maybe it has been silently fixed in later versions of Apache?

Systemd services as cronjobs: No process runs away

But why?

Cronjobs typically consists of a single utility which we’re pretty confident about. Even if it takes quite some time to complete (updatedb, for example), there’s always a simple story, a single task to complete with a known beginning and end.

If the task involves a shell script that calls a few utilities, that feeling of control fades. It’s therefore reassuring to know that everything can be cleaned up neatly by simple stopping a service. Systemd is good at that, since all processes that are involved in the service are kept in a separate cgroup. So when the service is stopped, all processes that were possibly generated eventually get a SIGKILL, typically 90 seconds after the request to stop the service, unless they terminated voluntarily in response to the initial SIGTERM.

Advantage number two is that the systemd allows for a series of capabilities to limit what the cronjob is capable of doing, thanks to the cgroup arrangement. This doesn’t fall very short from the possibilities of container virtualization, with pretty simple assignments in the unit file. This includes making certain directories inaccessible or accessible for read-only, setting up temporary directories, disallow external network connection, limit the set of allowed syscalls, and of course limit the amount of resources that are consumed by the service. They’re called Control Groups for a reason.

There’s also the RuntimeMaxSec parameter in the service unit file, which is the maximal wall clock time the service is allowed to run. The service is terminated and put in failure state if this time is exceeded. This is however supported from systemd version 229 and later, so check with “systemctl –version”.

My original idea was to use systemd timers to kick off the job, and let RuntimeMaxSec make sure it would get cleaned up if it ran too long (i.e. got stuck somehow). But because the server in question ran a rather old version of systemd, I went for a cron entry for starting the service and another one for stopping it, with a certain time difference between them. In hindsight, cron turned to be neater for kicking off the jobs, because I had multiple variants of them in different times. So one single file enclosed all.

The main practical difference is that if a service reaches RuntimeMaxSec, it’s terminated with a failed status. The cron solution stops the service without this. I guess there’s a systemctl way to achieve the failed status, if that’s really important.

As a side note, I have a separate post on Firejail, which is yet another possibility to use cgroups for controlling what processes do.

Timer basics

The idea is simple: A service can be started as a result of a timer event. That’s all that timer units do.

Timer units are configured like any systemd units (man systemd.unit) but have a .timer suffix and a dedicated [Timer] section. By convention, the timer unit named foo.timer activates the service foo.service, unless specified differently with the Unit= assignment (useful for generating confusion).

Units that are already running when the timer event occurs are not restarted, but are left to keep running. Exactly like systemctl start would do.

For an cronjob-style timer, use OnCalendar= to specify the times. See man systemd.time for the format. Note that AccuracySec= should be set too to control how much systemd can play with the exact time of execution, or systemd’s behavior might be confusing.

To see all active timers, go

$ systemctl list-timers

The unit file

As usual, the unit file (e.g. /etc/systemd/system/cronjob-test@.service) is short and concise:

Description=Cronjob test service

ExecStart=/home/eli/shellout/utils/ "%I"

This is a simple service, meaning that systemd expects the process launched by ExecStart to run in the foreground.

Note however that the service unit’s file name has a “@” character and that %I is used to choose what to run, based upon the unescaped instance name (see main systemd.unit). This turns the unit file into a template, and allows choosing an arbitrary command (the script is explained below) with something like (really, this works)

# systemctl start cronjob-test@'echo "Hello, world"'

This might seems dangerous, but recall that root privileges are required to start the service, and you get a plain-user process (possibly with no ability to escalate privileges) in return. Not the big jackpot.

For stopping the service, exactly the same service specifier string is required. But it’s also possible to stop all instances of a service with

# systemctl stop 'cronjob-test@*'

How neat is that?

A few comments on this:

  • The service should not be systemd-wise enabled (i.e. no “systemctl enable”) — that’s what you do to get it started on boot or following some kind of event. This is not the case, as the whole point is to start the service directly by a timer or crond.
  • Accordingly, the service unit file does not have an [Install] section.
  • A side effect of this is that the service may not appear in the list made by “systemctl” (without any arguments) unless it has processes running on its behalf currently running (or possibly if it’s in the failed state). Simple logic: It’s not loaded unless it has a cgroup allocated, and the cgroup is removed along with the last process. But it may appear anyhow under some conditions.
  • ExecStart must have a full path (i.e. not relative) even if the WorkingDirectory is set. In particular, it can’t be ./something.
  • A “systemctl start” on a service that is marked as failed will be started anyhow (i.e. the fact that it’s marked failed doesn’t prevent that). Quite obvious, but I tested it to be sure.
  • Also, a “systemctl start” causes the execution of ExecStart if and only if there’s no cgroup for it, which is equivalent to not having a process running on its behalf
  • KillMode is set to “mixed” which sends a SIGTERM only to the process that is launched directly when the service is stopped. The SIGKILL 90 seconds later, if any, is sent to all processes however. The default is to give all processes in the cgroup the SIGTERM when stopping.
  • NoNewPrivileges is a little paranoid thing: When no process has any reason to change its privileges or user IDs, block this possibility. This mitigates damage, should the job be successfully attacked in some way. But I ended up not using it, as running sendmail fails (it has some setuid thing to allow access to the mail spooler).


There is no log entry for a service of simple type that terminates with a success status. Even though it’s stopped in the sense that it has no allocated cgroup and “systemctl start” behaves as if it was stopped, a successful termination is silent. Not sure if I like this, but that’s the way it is.

When the process doesn’t respond to SIGTERM:

Jan 16 19:13:03 systemd[1]: Stopping Cronjob test service...
Jan 16 19:14:33 systemd[1]: cronjob-test.service stop-sigterm timed out. Killing.
Jan 16 19:14:33 systemd[1]: cronjob-test.service: main process exited, code=killed, status=9/KILL
Jan 16 19:14:33 systemd[1]: Stopped Cronjob test service.
Jan 16 19:14:33 systemd[1]: Unit cronjob-test.service entered failed state.

So there’s always “Stopping” first and then “Stopped”. And if there are processes in the control group 90 seconds after “Stopping”, SIGKILL is sent, and the service gets a “failed” status. Not being able to quit properly is a failure.

A “systemctl stop” on a service that is already stopped is legit: The systemctl utility returns silently with a success status, and a “Stopped” message appears in the log without anything actually taking place. Neither does the service’s status change, so if it was considered failed before, so it remains. And if the target to stop was a group if instances (e.g. systemctl stop ‘cronjob-test@*’) and there were no instances to stop, there’s even not a log message on that.

Same logic with “Starting” and “Started”: A superfluous “systemctl start” does nothing except for a “Started” log message, and the utility is silent, returning success.

Capturing the output

By default, the output (stdout and stderr) of the processes is logged in the journal. This is usually pretty convenient, however I wanted the good old cronjob behavior: An email is sent unless the job is completely silent and exits with a success status (actually, crond doesn’t care, but I wanted this too).

This concept doesn’t fit systemd’s spirit: You don’t start sending mails each time a service has something to say. One could use OnFailure for activating another service that calls home when the service gets into a failure status (which includes a non-success termination of the main process), but that mail won’t tell me the output. To achieve this, I wrote a Perl script. So there’s one extra process, but who cares, systemd kills’em all in the end anyhow.

Here it comes (I called it


use strict;
use warnings;

# Parameters for sending mail to report errors
my $sender = 'eli';
my $recipient = 'eli';
my $sendmail = "/usr/sbin/sendmail -i -f$sender";

my $cmd = shift;
my $start = time();

my $output = '';

my $catcher = sub { finish("Received signal."); };

$SIG{HUP} = $catcher;
$SIG{TERM} = $catcher;
$SIG{INT} = $catcher;
$SIG{QUIT} = $catcher;

my $pid = open (my $fh, '-|');

finish("Failed to fork: $!")
  unless (defined $pid);

if (!$pid) { # Child process
  # Redirect stderr to stdout for child processes as well
  open (STDERR, ">&STDOUT");

  exec($cmd) or die("Failed to exec $cmd: $!\n");

# Parent
while (defined (my $l = <$fh>)) {
  $output .= $l;

close $fh
 or finish("Error: $! $?");

finish("Execution successful, but output was generated.")
 if (length $output);

exit 0; # Happy end
sub finish {
  my ($msg) = @_;

  my $elapsed = time() - $start;

  $msg .= "\n\nOutput generated:\n\n$output\n"
    if (length $output);

  open (my $fh, '|-', "$sendmail $recipient") or
    finish("Failed to run sendmail: $!");

  print $fh <<"END";
From: Shellout script <$sender>
Subject: systemd cron job issue
To: $recipient

The script with command \"$cmd\" ran $elapsed seconds.


  close $fh
    or die("Failed to send email: $! $?\n");

  $SIG{TERM} = sub { }; # Not sure this matters
  kill -15, $$; # Kill entire process group


First, let’s pay attention to

open (STDERR, ">&STDOUT");

which makes sure standard error is redirected to standard output. This is inherited by child processes, which is exactly the point.

The script catches the signals (SIGTERM in particular, which is systemd’s first hint that it’s time to pack and leave) and sends a SIGTERM to all other processes in turn. This is combined with KillMode being set to “mixed” in the service unit file, so that only gets the signal, and not the other processes.

The rationale is that if all processes get the signal at once, it may (theoretically?) turn out that the child process terminates before the script reacted to the signal it got itself, so it will fail to report that the reason for the termination was a signal, as opposed to the termination of the child. This could miss a situation where the child process got stuck and said nothing when being killed.

Note that the script kills all processes in the process group just before quitting due to a signal it got, or when the invoked process terminates and there was output. Before doing so, it sets the signal handler to a NOP, to avoid an endless loop, since the script’s process will get it as well (?). This NOP thing appears to be unnecessary, but better safe than sorry.

Also note that the while loop quits when there’s nothing more in <$fh>. This means that if the child process forks and then terminates, the while loop will continue, because unless the forked process closed its output file handles, it will keep the reference count of the script’s stdin above zero. The first child process will remain as a zombie until the forked process is done. Only then will it be reaped by virtue of the close $fh. This machinery is not intended for fork() sorcery.

I took a different approach in another post of mine, where the idea was to fork explicitly and modify the child’s attributes. Another post discusses timing out a child process in general.


Yes, cronjobs are much simpler. But in the long run, it’s a good idea to acquire the ability to run cronjobs as services for the sake of keeping the system clean from runaway processes.

Perl + DBI: Measuring the time of database queries

It’s often desired to know how much wall clock time the SQL queries take. As with Perl, there’s more than one way to do it. This is a simple way, which involves overriding DBI’s execute() method, so it measures the time and issues a warn() with basic caller info and the time in milliseconds.

The same thing can be done with any other method, of course. Go “man DBI” and look for “Subclassing the DBI” for the full story.

So the first thing is to define the RootClass property on the DBI object generation, so that MySubDBI is the root class. Something like

$dbh = DBI->connect( "DBI:mysql::localhost", "thedatabase", "thepassword",
		     { RaiseError => 1, AutoCommit => 1, PrintError => 0,
		       RootClass => 'MySubDBI',
		       Taint => 1});

and then, somewhere, the class needs to be defined. This can be in a separate module .pm file, but also at the end of the same file as the code for the DBI->connect:

package MySubDBI;
use strict;

use DBI;
use vars qw(@ISA);
@ISA = qw(DBI);

package MySubDBI::db;
use vars qw(@ISA);
@ISA = qw(DBI::db);

# Empty, yet necessary.

package MySubDBI::st;
use Time::HiRes qw(gettimeofday tv_interval);
use vars qw(@ISA);
@ISA = qw(DBI::st);

sub execute {
  my ($sth, @args) = @_;
  my $tic = [gettimeofday];
  my $res = $sth->SUPER::execute(@args);
  my $exectime = int(tv_interval($tic)*1000);

  my ($package0, $file0, $line0, $subroutine0) = caller(0);
  my ($package1, $file1, $line1, $subroutine1) = caller(1);

  warn("execute() call from $subroutine1 (line $line0) took $exectime ms\n");
  return $res;


The code can be smarter, of course. For example, issue a warning only if the query time exceeds a certain limit.

MySQL: LIKE vs. REGEXP() performance testing


At some point I needed to choose between using LIKE or REGEXP() for a not so simple string match. Without going into the details, the matching string contains a lot of wildcard segments, and while both would have done the job, I thought maybe REGEXP() would benefit from some extra information about the wildcard parts. It’s not that I cared that ‘%’ would match characters it shouldn’t, but I wanted so save some backtracking by telling the matching engine not to match just anything. Save some CPU, I thought. Spoiler: It was a nice thought, but no.

So I ran a few performance tests on a sample table:

CREATE TABLE product_info (
	cpe_name BLOB NOT NULL,
	PRIMARY KEY (cpe_name(511))
) engine = MyISAM;

Note that cpe_name is the primary key, and is hence a unique index.

I should also mention that the match patterns I use are for testing are practically useless for CPE name matching because they don’t handle escaped “:” characters properly. Just in case you know what a CPE is and you’re here for that. In short, these are just performance tests.

I did this on MySQL server version: 5.1.47, Source distribution. There are newer versions around, I know. Maybe they do better.

The art of silly queries

So there’s about more than half a million entries:

mysql> SELECT COUNT(*) FROM product_info;
| COUNT(*) |
|   588094 |
1 row in set (0.00 sec)

Let’s ask it another way:

mysql> SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE '%';
| COUNT(*) |
|   588094 |
1 row in set (0.08 sec)

Say what? LIKE ‘%’ is always true for a non-NULL BLOB. MySQL didn’t optimize this simple thing, and actually checked every single entry?

mysql> EXPLAIN SELECT COUNT(*) FROM product_info;
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE '%';
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
|  1 | SIMPLE      | product_info | ALL  | NULL          | NULL | NULL    | NULL | 588094 | Using where |
1 row in set (0.00 sec)

Apparently it did. Other silly stuff:

mysql> SELECT COUNT(*) FROM product_info WHERE NOT cpe_name IS NULL;
| COUNT(*) |
|   588094 |
1 row in set (0.08 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE NOT cpe_name IS NULL;
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
|  1 | SIMPLE      | product_info | ALL  | PRIMARY       | NULL | NULL    | NULL | 588094 | Using where |
1 row in set (0.00 sec)

Silly or what? The column is defined as “NOT NULL”. What is there to check? So maybe the idea is that if I make stupid queries, MySQL responds with stupid behavior. Well, not really:

mysql> SELECT COUNT(*) FROM product_info WHERE 1=1;
| COUNT(*) |
|   588094 |
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE 1=1;
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
1 row in set (0.00 sec)

It’s more like MySQL makes optimizations only if they’re really obvious. MySQL != gcc.


So it turns out that LIKE is really fast, and it seems to take advantage of the index:

mysql> SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:hummingbird:cyberdocs:-%';
| cpe_name                                        |
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
1 row in set (0.00 sec)

The same query, only with a regex:

mysql> SELECT cpe_name FROM product_info WHERE cpe_name REGEXP '^cpe:2.3:a:hummingbird:cyberdocs:-.*';
| cpe_name                                        |
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
1 row in set (0.21 sec)

Recall that indexing means that the rows are sorted. Because the initial part of the string is fixed, it’s possible to narrow down the number of rows to match with an index lookup, and then work on those that match that initial part. This was taken advantage of with the LIKE match, but apparently not with REGEXP():

mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:hummingbird:cyberdocs:-%';
| id | select_type | table        | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | product_info | range | PRIMARY       | PRIMARY | 513     | NULL |    1 | Using where |
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name REGEXP '^cpe:2.3:a:hummingbird:cyberdocs:-.*';
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
|  1 | SIMPLE      | product_info | ALL  | NULL          | NULL | NULL    | NULL | 588094 | Using where |
1 row in set (0.00 sec)

The conclusion is quite clear: For heavy duty matching, don’t use REGEXP() if LIKE can do the job. In particular if a lot of rows can be ruled out by virtue of the first characters in the string.

Making it harder for LIKE

Let’s warm up a bit:

mysql> SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:humming%:cyberdocs:-%';
| cpe_name                                        |
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
1 row in set (0.00 sec)

It was quicker than measurable, mainly because there was little to do:

mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:humming%:cyberdocs:-%';
| id | select_type | table        | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | product_info | range | PRIMARY       | PRIMARY | 513     | NULL |   80 | Using where |
1 row in set (0.00 sec)

As expected, the index was used for the initial part of the string, and that left MySQL with 80 rows to actually do the matching (in this specific case). So it was quick.

But what if the wildcard is at the beginning of the string?

mysql> SELECT cpe_name FROM product_info WHERE cpe_name LIKE '%cpe:2.3:a:hummingbird:cyberdocs:-%';
| cpe_name                                        |
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
1 row in set (0.10 sec)

So that took some considerable time, however still less than the REXEXP() case. The index didn’t help in this case, it seems:

mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name LIKE '%cpe:2.3:a:hummingbird:cyberdocs:-%';
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
|  1 | SIMPLE      | product_info | ALL  | NULL          | NULL | NULL    | NULL | 588094 | Using where |
1 row in set (0.00 sec)

So apparently, the pattern was applied to all rows. It was still faster than REGEXP() making a very simple match. So the latter seems not to be optimized in MySQL.

And to wrap this up, an expression full with wildcards. Similar to the one I thought maybe REXEXP would do better:

mysql> SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:%:%:%:-:%:%:%:%:%:%';
| COUNT(*) |
|    13205 |
1 row in set (0.11 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:%:%:%:-:%:%:%:%:%:%';
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
|  1 | SIMPLE      | product_info | ALL  | PRIMARY       | NULL | NULL    | NULL | 588094 | Using where |
1 row in set (0.00 sec)

So what happened here is that the initial part of the string didn’t help much, and the match seems to have been done on all rows. It took more or less the same time as the much simpler match pattern above.


There seems to be two main conclusions from this little set of experiments: The first one isn’t surprising: The most important factor is how many rows are being accessed, not so much what is done with them. And the second is that MySQL does some sensible optimizations when LIKE is used, in particular it narrows down the number of rows with the index, when possible. Something it won’t do with REGEXP(), even if with an “^” at the beginning of the regex.

MySQL: Enforcing long unique BLOB / TEXT columns with UNIQUE INDEX


When creating indexes to TEXT / BLOB columns (and their variants), it’s required to specify how many characters the index should cover. In MySQL’s docs it’s usually suggested to keep them short in for better performance. There’s also a limit on the number of characters, which varies from one database engine to another, going from a few hundreds to a few thousands characters.

However it’s not unusual to use UNIQUE INDEX for making the database enforce the uniqueness of a field in the table. ON DUPLICATE KEY, INSERT IGNORE, UPDATE IGNORE and REPLACE can then be used to gracefully keep things tidy.

But does UNIQUE INDEX mean that the entire field remains unique, or is only the part covered by the index checked? Spoiler: Only the part covered by the index. In other words, MySQL’s uniqueness enforcement may be too strict.

Almost needless to say, if the index isn’t UNIQUE, it’s just a performance issue: If the number of covered characters is small, the database will more often fetch and discard the data of a row that turns out not to match the search criteria. But it doesn’t change the logic behavior. With UNIQUE INDEX, the number of characters does.

A simple test follows.


I created a table with MySQL 5.1.47, with an index covering only 5 chars. Deliberately very short.

	message BLOB NOT NULL,
	UNIQUE INDEX (message(5))
) engine = MyISAM;

Which ends up with this:

mysql> DESCRIBE try;
| Field   | Type             | Null | Key | Default | Extra          |
| id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| message | blob             | NO   | UNI | NULL    |                |
2 rows in set (0.00 sec)

Inserting the first row:

mysql> INSERT INTO try(message) VALUES('Hello, world');
Query OK, 1 row affected (0.00 sec)

And now trying a second:

mysql> INSERT INTO try(message) VALUES('Hello there');
ERROR 1062 (23000): Duplicate entry 'Hello' for key 'message'

That’s it. It just looked at the first five chars. Trying a difference within this region:

mysql> INSERT INTO try(message) VALUES('Hell there');
Query OK, 1 row affected (0.00 sec)

No wonder, that worked.

mysql> SELECT * FROM try;
| id | message      |
|  1 | Hello, world |
|  2 | Hell there   |
2 rows in set (0.00 sec)

Handling large TEXT/BLOB

And that leaves the question: How is it possible to ensure uniqueness on large chunks of text or binary data? One solution I can think of is to add a column for a hash (say SHA1), and let the application calculate that hash for each row it inserts, and insert it along with it. And make the UNIQUE INDEX on the hash, not the text. Something like

	message BLOB NOT NULL,
	UNIQUE INDEX (hash(40))
) engine = MyISAM;

But wait. MySQL supports hashing functions. Why not use them instead? Well, the problem is that if I want an INSERT statement push the data and its hash in one go they query becomes a bit nasty. What came to my mind is:

mysql> INSERT INTO try2(message, hash) (SELECT t.x, SHA1(t.x) FROM (SELECT 'Hello there' AS x) AS t);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

Can it get more concise than this? Suggestions are welcome. The double SELECT is required because I want the string literal to be mentioned once.

Isn’t it easier to let the application calculate the SHA1, and send it to the server by value? It’s a matter of taste, I guess.

Anyhow, trying again with exactly the same:

mysql> INSERT INTO try2(message, hash) (SELECT t.x, SHA1(t.x) FROM (SELECT 'Hello there' AS x) AS t);
ERROR 1062 (23000): Duplicate entry '726c76553e1a3fdea29134f36e6af2ea05ec5cce' for key 'hash'

and with something slightly different:

mysql> INSERT INTO try2(message, hash) (SELECT t.x, SHA1(t.x) FROM (SELECT 'Hello there!' AS x) AS t);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

So yep, it works:

mysql> SELECT * FROM try2;
| id | message      | hash                                     |
|  1 | Hello there  | 726c76553e1a3fdea29134f36e6af2ea05ec5cce |
|  2 | Hello there! | 6b19cb3790b6da8f7c34b4d8895d78a56d078624 |
2 rows in set (0.00 sec)

Once again, even though the example I showed demonstrates how to make MySQL calculate the hash, I would do it in the application.

Decoding email’s quoted-printable with Perl

To make it short, the command at shell prompt is

$ perl -MMIME::QuotedPrint -e 'local $/; $x=<>; print decode_qp($x)' < quoted.txt > unquoted.html

and I needed this to extract an HTML segment of an email.