Replacing ntpd with systemd-timesyncd (Mint 18.1)

Introduction

It all began when I noted that my media center Linux machine (Linux Mint 18.1, Serena) finished a TV recording a bit earlier than expected. Logging in and typing “date” I was quite surprised to find out that the time was off by half a minute.

The first question that comes to mind is why the time synchronization didn’t work. The second is, if it didn’t work, how come I hadn’t noted this issue earlier? The computer is in use as a media center for little less than two years.

What happened

It turns out (and it wasn’t easy to tell) that the relevant daemon was ntpd.

So what’s up, ntp?

$ systemctl status ntp
● ntp.service - LSB: Start NTP daemon
   Loaded: loaded (/etc/init.d/ntp; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2018-12-19 12:38:06 IST; 1 months 7 days ag
     Docs: man:systemd-sysv-generator(8)
  Process: 1257 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)
  Process: 1385 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)

Dec 19 12:38:06 tv systemd[1]: Starting LSB: Start NTP daemon...
Dec 19 12:38:06 tv ntp[1385]:  * Starting NTP server ntpd
Dec 19 12:38:06 tv ntp[1385]:    ...done.
Dec 19 12:38:06 tv systemd[1]: Started LSB: Start NTP daemon.
Dec 19 12:38:06 tv ntpd[1398]: proto: precision = 0.187 usec (-22)
Dec 19 12:38:08 tv systemd[1]: Started LSB: Start NTP daemon.

Looks fairly OK. Maybe the logs can tell something?

$ journalctl -u ntp
Dec 19 12:38:02 tv systemd[1]: Stopped LSB: Start NTP daemon.
Dec 19 12:38:02 tv systemd[1]: Starting LSB: Start NTP daemon...
Dec 19 12:38:02 tv ntp[1055]:  * Starting NTP server ntpd
Dec 19 12:38:02 tv ntpd[1074]: ntpd 4.2.8p4@1.3265-o Wed Oct  5 12:34:45 UTC 2016 (1): Starting
Dec 19 12:38:02 tv ntpd[1076]: proto: precision = 0.175 usec (-22)
Dec 19 12:38:02 tv ntp[1055]:    ...done.
Dec 19 12:38:02 tv systemd[1]: Started LSB: Start NTP daemon.
Dec 19 12:38:02 tv ntpd[1076]: Listen and drop on 0 v6wildcard [::]:123
Dec 19 12:38:02 tv ntpd[1076]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Dec 19 12:38:02 tv ntpd[1076]: Listen normally on 2 lo 127.0.0.1:123
Dec 19 12:38:02 tv ntpd[1076]: Listen normally on 3 lo [::1]:123
Dec 19 12:38:02 tv ntpd[1076]: Listening on routing socket on fd #20 for interface updates
Dec 19 12:38:03 tv ntpd[1076]: error resolving pool 0.ubuntu.pool.ntp.org: Temporary failure in name resolution (-3)
Dec 19 12:38:04 tv ntpd[1076]: error resolving pool 1.ubuntu.pool.ntp.org: Temporary failure in name resolution (-3)
Dec 19 12:38:05 tv ntpd[1076]: error resolving pool 2.ubuntu.pool.ntp.org: Temporary failure in name resolution (-3)
Dec 19 12:38:06 tv systemd[1]: Stopping LSB: Start NTP daemon...
Dec 19 12:38:06 tv ntp[1257]:  * Stopping NTP server ntpd
Dec 19 12:38:06 tv ntp[1257]:    ...done.
Dec 19 12:38:06 tv systemd[1]: Stopped LSB: Start NTP daemon.
Dec 19 12:38:06 tv systemd[1]: Stopped LSB: Start NTP daemon.
Dec 19 12:38:06 tv systemd[1]: Starting LSB: Start NTP daemon...
Dec 19 12:38:06 tv ntp[1385]:  * Starting NTP server ntpd
Dec 19 12:38:06 tv ntp[1385]:    ...done.
Dec 19 12:38:06 tv systemd[1]: Started LSB: Start NTP daemon.
Dec 19 12:38:06 tv ntpd[1398]: proto: precision = 0.187 usec (-22)
Dec 19 12:38:08 tv systemd[1]: Started LSB: Start NTP daemon.

Hmmm… There is some kind of trouble there, but it was surely resolved. Or? In fact, there was no ntpd process running, so maybe it just died?

Let’s try to restart the daemon, and see what happens. As root,

# systemctl restart ntp

after which the log went

Jan 26 20:36:46 tv systemd[1]: Stopping LSB: Start NTP daemon...
Jan 26 20:36:46 tv ntp[32297]:  * Stopping NTP server ntpd
Jan 26 20:36:46 tv ntp[32297]: start-stop-daemon: warning: failed to kill 1398: No such process
Jan 26 20:36:46 tv ntp[32297]:    ...done.
Jan 26 20:36:46 tv systemd[1]: Stopped LSB: Start NTP daemon.
Jan 26 20:36:46 tv systemd[1]: Starting LSB: Start NTP daemon...
Jan 26 20:36:46 tv ntp[32309]:  * Starting NTP server ntpd
Jan 26 20:36:46 tv ntp[32309]:    ...done.
Jan 26 20:36:46 tv systemd[1]: Started LSB: Start NTP daemon.
Jan 26 20:36:46 tv ntpd[32324]: proto: precision = 0.187 usec (-22)
Jan 26 20:36:46 tv ntpd[32324]: Listen and drop on 0 v6wildcard [::]:123
Jan 26 20:36:46 tv ntpd[32324]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 2 lo 127.0.0.1:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 3 enp3s0 10.1.1.22:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 4 lo [::1]:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 5 enp3s0 [fe80::f757:9ceb:2243:3e16%2]:123
Jan 26 20:36:46 tv ntpd[32324]: Listening on routing socket on fd #22 for interface updates
Jan 26 20:36:47 tv ntpd[32324]: Soliciting pool server 118.67.200.10
Jan 26 20:36:48 tv ntpd[32324]: Soliciting pool server 210.23.25.77
Jan 26 20:36:49 tv ntpd[32324]: Soliciting pool server 211.233.40.78
Jan 26 20:36:50 tv ntpd[32324]: Soliciting pool server 43.245.49.242
Jan 26 20:36:30 tv ntpd[32324]: Soliciting pool server 45.76.187.173
Jan 26 20:36:30 tv ntpd[32324]: Soliciting pool server 46.19.96.19
Jan 26 20:36:31 tv ntpd[32324]: Soliciting pool server 210.173.160.87
Jan 26 20:36:31 tv ntpd[32324]: Soliciting pool server 119.28.206.193
Jan 26 20:36:49 tv ntpd[32324]: Soliciting pool server 133.243.238.244
Jan 26 20:36:49 tv ntpd[32324]: Soliciting pool server 91.189.89.199

Aha! So this is what a kickoff of ntpd should look like! Clearly ntpd didn’t recover all that well from the lack of internet connection (I suppose) during the media center’s bootup. Maybe it died, and was never restarted. The irony is that systemd has a wonderful mechanism for restarting failing daemons, but ntpd is still under the backward-compatible LSB interface. So the system silently remained with no time synchronization.

Go the systemd way

systemd supplies its own lightweight time synchronization mechanism, systemd-timesyncd. It makes much more sense, as it doesn’t open NTP ports as a server (like ntpd does, one may wonder what for), but just synchronizes the computer it runs on to the remote NTP server. And judging from my previous experience with systemd, in the event of multiple solutions, go for the one systemd offers. In fact, it’s sort-of enabled by default:

$ systemctl status systemd-timesyncd
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/systemd-timesyncd.service.d
           └─disable-with-time-daemon.conf
   Active: inactive (dead)
Condition: start condition failed at Wed 2018-12-19 12:38:01 IST; 1 months 7 days ago
           ConditionFileIsExecutable=!/usr/sbin/VBoxService was not met
     Docs: man:systemd-timesyncd.service(8)

Start condition failed? What’s this? Let’s look at the drop-in file:

$ cat /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
[Unit]
# don't run timesyncd if we have another NTP daemon installed
ConditionFileIsExecutable=!/usr/sbin/ntpd
ConditionFileIsExecutable=!/usr/sbin/openntpd
ConditionFileIsExecutable=!/usr/sbin/chronyd
ConditionFileIsExecutable=!/usr/sbin/VBoxService

Oh please, you can’t be serious. Disabling the execution because of the existence of a file? If another NTP daemon is installed, does it mean it’s being enabled? In particular, if VBoxService is installed, does it mean we’re running as guests on a virtual machine? Like, seriously, someone might just install the Virtual Box client tools for no reason at all, and poof, there goes the time synchronization without any warning (note that this wasn’t the problem I had).

Moving to systemd-timesyncd

As mentioned earlier, systemd-timesyncd is enabled by default, but one may insist:

# systemctl enable systemd-timesyncd.service

(Nothing response, because it’s enabled anyhow)

However in order to make it work, remove the condition that prevents it from running:

# rm /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf

and then disable and stop ntpd:

# systemctl disable ntp
# systemctl stop ntp

On my computer, the other two time synchronizing tools (openntpd and chrony) aren’t installed, so they are not to worry about.

And then we have timedatectl

Note directly related, and still worth mentioning

$ timedatectl
      Local time: Sat 2019-01-26 21:22:57 IST
  Universal time: Sat 2019-01-26 19:22:57 UTC
        RTC time: Sat 2019-01-26 19:22:57
       Time zone: Asia/Jerusalem (IST, +0200)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

Systemd is here to take control of everything, obviously.

Cinelerra 2019 notes

Cinelerra is alive and kicking. I’ve recently downloaded the “goodguy” revision of Cinelerra, Nov 29 2018 build (called “cin” for some reason), which is significantly smoother than the tool I’m used to.

Notes:

  • There are now two ways to set the effects’ attributes. With the good-old magnifying glass (“Controls”) in with the gear (“Presets”), which gives a textual interface
  • Unlike what I’m used to, the Controls only set the global parameters, even with “Genererate keyframes while tweeking” on (spelling as used in Cinelerra).
  • In order to create keyframes, enable “Generate keyframes” and go for the “gear” tool. That isn’t much fun, because the settings are manual.
  • If the Controls are used, all keyframes get the value.
  • When rendering, there are presets. For a decent MP4 output, go for FFMPEG / mp4 fileformat, then the h264.mp4 preset for audio (Bitrate = 0, Quality = -1, Samples fltp) and same for video (Bitrate = 0, Quality = -1, Pixels = yuv420p and video options keyint_min=25 and x264opts keyint=25).
  • Motion correction (anti-camera-shake). When used properly, really smooth output is obtained. Use the “Motion” effect (not one of the variants, just “Motion”). Enable “Track translation” (and possibly also rotation, haven’t tried it). Enable “Play every frame” on Settings > Playback A, or testing the smoothing by playing the video will result in plain junk. Be sure “Draw vectors” is enabled for the experimentation stage, but turn it off for rendering or it will appear in the rendered video. The inner box shows which region is sampled for motion detection, and the outer box shows how far it looks for motion. It’s controlled by “Translation search radius / block size” as well as Block X/Y. Select “Track previous frame”, action is “Stabilize Subpixel” and calculation to Recalculate (or completely unrelated things may happen).  Play with “Maximum absolute offset” and “Motion settling speed” to get things smooth (the latter determines how quickly the frame is forced back towards zero correction, lower gives smoother output, try 4).

Solved: Missing ktorrent icon on Linux Mint / Cinnamon

Running ktorrent on Linux Mint 19 (Tara), the famous downwards-arrow icon was invisible on the system tray. Which made it appear like the program had quit when it was actually minimized. Clicking the empty box made ktorrent re-appear.

Solution: Invoke the Qt5 configuration tool

$ qt5ct

and under the Appearance tab set “Style” to gtk2 (I believe it was “Fusion” before). It’s not just prettier generally, but after restarting ktorrent, the icon is there.

Actually, it’s probably not about the style, but the fact that qt5ct was run. Because before making the change, the ktorrent printed out the following when launched from the command line:

Mon Dec 24 09:52:55 2018: Qt Warning: QSystemTrayIcon::setVisible: No Icon set
Warning: QSystemTrayIcon::setVisible: No Icon set
Mon Dec 24 09:52:55 2018: Starting minimized
Mon Dec 24 09:52:55 2018: Started update timer
Mon Dec 24 09:52:55 2018: Qt Warning: inotify_add_watch("/home/eli/.config/qt5ct") failed: "No such file or directory"
Warning: inotify_add_watch("/home/eli/.config/qt5ct") failed: "No such file or directory"

The “No Icon set” warning is misleading, because it continued to appear. This is after the fix, with the icon properly in place in the tray:

Mon Dec 24 10:16:17 2018: Qt Warning: QSystemTrayIcon::setVisible: No Icon set
Warning: QSystemTrayIcon::setVisible: No Icon set

Anyhow, problem fixed. For me, that is.

And why ktorrent? Because its last reported vulnerability was in 2009, compared with “Transmission” which had a nasty issue in January 2018. Actually, the exploit in Transmission is interesting by itself, with a clear lesson: If you set up a webserver on the local host for any purpose, assume anyone can access it. Setting it to respond to 127.0.0.1 only doesn’t help.

 

Writing a panel applet for Cinnamon: The basics

Introduction

What I wanted: A simple applet on Cinnamon, which allows me to turn a service on and off (hostapd, a Wifi hotspot). I first went for Argos catch-all extension, and learned that Cinnamon isn’t gnome-shell, and in particular that extensions for gnome-shell don’t (necessarily?) work with Cinnamon.

Speaking of which, my system is Linux Mint 19 on an x86_64, with

$ cinnamon --version
Cinnamon 3.8.9

So I went for writing the applet myself. Given the so-so level of difficulty, I should have done that to begin with.

Spoiler: I’m not going to dive into the details of that, because my hostapd-firewall-DHCP daemon setting is quite specific. Rather, I’ll discuss about some general aspects of writing an applet.

So what is it like? Well, quite similar to writing something useful in JavaScript for a web page. Cinnamon’s applets are in fact written in JavaScript, and it feels pretty much the same. In particular, this thing about nothing happening when there’s an error, now go figure what it was. And yes, there’s an error log console which helps with syntax errors (reminds browsers’ error log, discussed below) but often run-time errors just lead to nothing. A situation that is familiar to anyone with JavaScript experience.

And I also finally understand why the cinnamon process hogs CPU all the time. OK, it’s usually just a few percents, and still, what is it doing all that time with no user activity? Answer: Running some JavaScript, I suppose.

But all in all, if you’re good with JavaScript and understand the concepts of GUI programming and events + fairly OK with object oriented programming, it’s quite fun. And there’s another thing you better be good at:

Read The Source

As of December 2018, the API for Cinnamon applets is hardly documented, and it’s somewhat messy. So after reading a couple of tutorials (See “References” at the bottom of this post), the best way to grasp how to get X done is by reading the sources of existing applets:

  • System-installed: /usr/share/cinnamon/applets
  • User-installed: ~/.local/share/cinnamon/applets
  • Cinnamon’s core JavaScript sources: /usr/share/cinnamon/js

Each of these contains several subdirectories, typically with the form name@creator, one for each applet that is available for adding to the panels. Each of these has at least two files, which are also those to supply for your own applet:

  • metadata.json, which contains some basic info on the applet (probably used while selecting applets to add).
  • applet.js, which contains the JavaScript code for the applet.

It doesn’t matter if they’re executable, even though they often are.

There may also be additional *.js files.

Also, there might also be a po/ directory, which often contains .po and .pot files that are intended for localizing the text displayed to the user. These go along with the _() function in the JavaScript code. For the purposes of a simple applet, these are not necessary. Ignore these _(“Something”) things in the JavaScript code, and read them as just “Something”.

Some applets allow parameter setting. The runtime values for these are at ~/.cinnamon, which contains configuration data etc.

Two ways to object orient

Unfortunately, there are two styles for defining the applet class, both of which are used. This is a matter of minor confusion if you read the code of a few applets, and therefore worthy to note: Some of the applets use JavaScript class declarations (extending a built-in class), e.g.

class CinnamonSoundApplet extends Applet.TextIconApplet {
    constructor(metadata, orientation, panel_height, instanceId) {
        super(orientation, panel_height, instanceId);

and others use the “prototype” syntax:

MyApplet.prototype = {
  __proto__: Applet.IconApplet.prototype,

and so on. I guess they’re equivalent, despite the difference in syntax. Note that in the latter format, the constructor is a function called _init().

This way or another, all classes that employ timeout callbacks should have a destroy() method (no underscore prefix) to cancel them before quitting.

I wasn’t aware of these two syntax possibilities, and therefore started from the first applet I got my hands on. It happened to be written in the “prototype” syntax, which is probably the less preferable choice. I’m therefore not so sure my example below is a good starter.

Getting Started

It’s really about three steps to get an applet up and running.

  • Create a directory in ~/.local/share/cinnamon/applets/ and put the two files there: metadata.json and applet.js.
  • Restart Cinnamon. No, it’s not as bad as it sounds. See below.
  • Install the applet to some panel, just like any other applet.
I warmly suggest copying an existing applet and hacking it. You can start with the skeleton applet I’ve listed below, but there are plenty other available on the web, in particular along with tutorials.

The development cycle (or: how to “run”)

None of the changes made in the applet’s directory (well, almost none) take any effect until Cinnamon is restarted, and when it is, everything is in sync. It’s not like a reboot, and it’s fine to do on the computer you’re working on, really. All windows remain in their workspaces (even though the windows’ tabs at the panel may change order). No reason to avoid this, even if you have a lot of windows opened. Done it a gazillion times.

So how to restart Cinnamon: ALT-F2, type “r” and Enter. Then cringe as your desktop fades away and be overwhelmed when it returns, and nothing bad happened.

If something is wrong with your applet (or otherwise), there a notification saying “Problems during Cinnamon startup” elaborating that “Cinnamon started successfully, but one or more applets, desklets or extensions failed to load”. From my own experience, that’s as bad as it gets: The applet wasn’t loaded, or doesn’t run properly.

Press Win+L (or ALT-F2, then type “lg” and Enter, or type “cinnamon-looking-glass” at shell prompt as non-root user) to launch the Looking Glass tool (called “Melange”). The Log tab is helpful with detailed error messages (colored red, that helps). Alternatively, look for the detailed error message in .xsession-errors in your home directory.

Note that the error message often appears before the line saying that the relevant applet was loaded.

OK, so now to some more specific topics.

Custom icons

Icons are referenced by their file name, without extension, in the JavaScript code as well as the metadata.json file (as “icon” assignment). The search path is the applet’s own icons/ subdirectory and the system icons, present at /usr/share/icons/.

My own experience is that creating an icons/ directory side-by-side with applet.js, and putting a PNG file named wifi-icon-off.png there makes a command like

this.set_applet_icon_name("wifi-icon-off");

work for setting the applet’s main icon on the panel. The PNG’s transparency is honored. The official file format is SVG, but who’s got patience for that.

Same goes with something menu items with icons:

item = new PopupMenu.PopupIconMenuItem("Access point off", "wifi-icon-off", St.IconType.FULLCOLOR);

item.connect('activate', Lang.bind(this, function() {
   Main.Util.spawnCommandLine("/usr/local/bin/access-point-ctl off");
}));
this.menu.addMenuItem(item);

My own experience with the menu items is that if the icon file isn’t found, Cinnamon silently puts an empty slot instead. JavaScript-style no fussing.

I didn’t manage to achieve something similar with the “icon” assignment in metadata.json, so the choices are either to save the icon in /usr/share/icons/, or use one of the system icons, or eliminate the “icon” assignment altogether from the JSON file. I went to the last option. This resulted in a dull default icon when installing the applet, but this is of zero importance for an applet I’ve written myself.

Running shell commands from JavaScript

The common way to execute a shell command is e.g.

const Main = imports.ui.main;

Main.Util.spawnCommandLine("gnome-terminal");

The assignment of Main is typically done once, and at the top of the script, of course.

When the output of the command is of interest, it becomes slightly more difficult. The following function implements the parallel of the Perl backtick operator: Run the command, and return the result as a string. Note that unlike its bash counterpart, newlines remain newlines, and are not translated into spaces:

const GLib = imports.gi.GLib;

function backtick(command) {
  try {
    let [result, stdout, stderr] = GLib.spawn_command_line_sync(command);
    if (stdout != null) {
      return stdout.toString();
    }
  }
  catch (e) {
    global.logError(e);
  }

  return "";
}

and then one can go e.g.

let output = backtick("/bin/systemctl is-active hostapd");

after which output is a string containing the result of the execution (with a trailing newline, by the way).

As of December 2018, there’s no proper documentation of Cinnamon’s Glib wrapper, however the documentation of the C library can give an idea.

My example applet

OK, so here’s a skeleton applet for getting started with.

Its pros:

  • It’s short, quite minimal, and keeps the mumbo-jumbo to a minimum
  • It shows a simple drop-down menu display applet, which allows running a different shell command from each entry.
Its cons:
  • It’s written in the less-preferable “prototype” syntax for defining objects.
  • It does nothing useful. In particular, the shell commands it executes exist only on my computer.
  • It depends on a custom icon (see “Custom Icons” above). Maybe this is an advantage…?

So if you want to give it a go, create a directory named ‘wifier@eli’ (or anything else?) in ~/.local/share/cinnamon/applets/, and put this as metadata.json:

{
    "description": "Turn Wifi Access Point on and off",
    "uuid": "wifier@eli",
    "name": "Wifier"
}

And this as applet.js:

const Applet = imports.ui.applet;
const Lang = imports.lang;
const St = imports.gi.St;
const Main = imports.ui.main;
const PopupMenu = imports.ui.popupMenu;
const UUID = 'wifier@eli';

function ConfirmDialog(){
  this._init();
}

function MyApplet(orientation, panelHeight, instanceId) {
  this._init(orientation, panelHeight, instanceId);
}

MyApplet.prototype = {
  __proto__: Applet.IconApplet.prototype,

  _init: function(orientation, panelHeight, instanceId) {
    Applet.IconApplet.prototype._init.call(this, orientation, panelHeight, instanceId);

    try {
      this.set_applet_icon_name("wifi-icon-off");
      this.set_applet_tooltip("Control Wifi access point");

      this.menuManager = new PopupMenu.PopupMenuManager(this);
      this.menu = new Applet.AppletPopupMenu(this, orientation);
      this.menuManager.addMenu(this.menu);

      this._contentSection = new PopupMenu.PopupMenuSection();
      this.menu.addMenuItem(this._contentSection);

      // First item: Turn on
      let item = new PopupMenu.PopupIconMenuItem("Access point on", "wifi-icon-on", St.IconType.FULLCOLOR);

      item.connect('activate', Lang.bind(this, function() {
					   Main.Util.spawnCommandLine("/usr/local/bin/access-point-ctl on");
					 }));
      this.menu.addMenuItem(item);

      // Second item: Turn off
      item = new PopupMenu.PopupIconMenuItem("Access point off", "wifi-icon-off", St.IconType.FULLCOLOR);

      item.connect('activate', Lang.bind(this, function() {
					   Main.Util.spawnCommandLine("/usr/local/bin/access-point-ctl off");
					 }));
      this.menu.addMenuItem(item);
    }
    catch (e) {
      global.logError(e);
    }
  },

  on_applet_clicked: function(event) {
    this.menu.toggle();
  },
};

function main(metadata, orientation, panelHeight, instanceId) {
  let myApplet = new MyApplet(orientation, panelHeight, instanceId);
  return myApplet;
}

Next, create an “icons” subdirectory (e.g. ~/.local/share/cinnamon/applets/wifier@eli/icons/) and put a small (32 x 32 ?) PNG image there as wifi-icon-off.png, which functions as the applet’s top icon. Possibly download mine from here.

Anyhow, be sure to have an icon file. Otherwise there will be nothing on the panel.

Finally, restart Cinnamon, as explained above. You will get errors when trying the menu items (failed execution), but don’t worry — nothing bad will happen.

References

Failed: Install Argos Shell Extension on Cinnamon

You have been warned

These are my pile of jots as I tried to install Argos “Gnome Shell Extension in seconds” on my Mint 19 Cinnamon machine. As the title implies, it didn’t work out, so I went for writing an applet from scratch, more or less.

Not being strong on Gnome internals, I’m under the impression that it’s simply because Cinnamon isn’t Gnome shell. This post is just the accumulation of notes I took while trying. Nothing to follow step-by-step, as it leads nowhere.

It’s here for the crumbs of info I gathered nevertheless.

Here we go

It says on the project’s Github page that a recent version of Gnome should include Argos. So I went for it:

# apt install gnome-shell-extensions

And since I’m at it:

# apt install gnome-tweaks

Restart Gnome shell: ALT-F2, type “r” and enter. For a second, it looks like a logout, but everything returns to where it was. Don’t hesitate doing this, even if there are a lot of windows opened.

Nada. So I went the manual way. First, found out my Gnome Shell version:

$ apt-cache show gnome-shell | grep Version

or better,

$ gnome-shell --version
GNOME Shell 3.28.3

and downloaded the extension for Gnome shell 3.28 from Gnome’s extension page. Then realized it’s slightly out of date with the git repo, so

$ git clone https://github.com/p-e-w/argos.git
$ cd argos
$ cp -r 'argos@pew.worldwidemann.com' ~/.local/share/cinnamon/extensions/

Note that I copied it into cinnamon’s subdirectory. It’s usually ~/.local/share/gnome-shell/extensions, but not when running Cinnamon!

Restart Gnome shell again: ALT-F2, type “r” and enter.

Then open the “Extensions” GUI thingy from the main menu. Argos extension appears. Select it and press the “+” button to add it.

Restart Gnome shell again. This time a notification appears, saying “Problems during Cinnamon startup” elaborating that “Cinnamon started successfully, but one or more applets, desklets or extensions failed to load”.

Looking at ~/.xsession-errors, I found

Cjs-Message: 12:14:42.822: JS LOG: [LookingGlass/error] [argos@pew.worldwidemann.com]: Missing property "cinnamon-version" in metadata.json

Can’t argue with that, can you? Let’s see:

$ cinnamon --version
Cinnamon 3.8.9

So edit ~/.local/share/cinnamon/extensions/argos@pew.worldwidemann.com/metadata.json and add the line marked in red (not at the end, because the last line doesn’t end with a comma):

  "version": 2,
  "cinnamon-version": [ "3.6", "3.8", "4.0" ],
  "shell-version": ["3.14", "3.16", "3.18", "3.20", "3.22", "3.24", "3.26", "3.28"]

I took this line from some applet I found under ~/.local/share/cinnamon. Not much thought given here.

And guess what? Reset again with ALT-F2 r. Failed again. Now in ~/.xsession-errors:

Cjs-Message: 12:50:32.643: JS LOG: [LookingGlass/error]
[argos@pew.worldwidemann.com]: No JS module 'extensionUtils' found in search path
[argos@pew.worldwidemann.com]: Error importing extension.js from argos@pew.worldwidemann.com

It seems like Cinnamon has changed the extension mechanism altogether, which explains why there’s no extension tab in Gnome Tweaker, and why extensionUtils is missing.

Maybe this explains it. Frankly, I didn’t bother to read that long discussion, but the underlying issue is probably buried there.

Setting PS1 with color codes properly with gnome-terminal

There are plenty of web pages describing the different escape codes for changing color of an ANSI emulating terminal. This is in particular useful for giving the shell prompt different colors, to prevent confusion between different computers, for example.

The trick is to set the PS1 bash variable. What is less told, is that \[ and \] tokens must enclose the color escape sequence, or things get completely crazy: Pasting into the terminal creates junk, newlines aren’t interpreted properly and a lot of other peculiarities with the cursor jumping to the wrong place all the time.

So this is wrong (just the color escape sequence, no enclosure):

PS1="\e[44m[\u@here \W]\e[m\\$ "

And this is right:

PS1="\[\e[44m\][\u@here \W]\[\e[m\]\\$ "

Even the “wrong” version about will produce the correct colors, but as mentioned above, other weird stuff happens.

5 GHz Wifi access point on Linux Mint 19 / Atheros

+ how to just compile an Ubuntu distribution kernel without too much messing around.

Introduction

It’s not a real computer installation if the Wifi works out of the box. So these are my notes to self for setting up an access point on a 5 GHz channel. I’ll need it somehow, because each kernel upgrade will require tweaking with the kernel module.

The machine is running Linux Mint 19 (Tara, based upon Ubuntu Bionic, with kernel 4.15.0-20-generic). The NIC is Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter, Vendor/Product IDs 168c:003e.

Installing hostapd

# apt install hostapd

/etc/hostapd/hostapd.conf as follows:

macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0

#Support older EAPOL authentication (version 1)
eapol_version=1

# Uncomment these for base WPA & WPA2 support with a pre-shared key
wpa=3
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP

wpa_passphrase=mysecret

# Customize these for your local configuration...
interface=wlan0
hw_mode=a
channel=52
ssid=mywifi
country_code=GD

and /etc/default/hostapd as follows:

DAEMON_CONF="/etc/hostapd/hostapd.conf"
DAEMON_OPTS=""

Then unmask the service by deleting /etc/systemd/system/hostapd.service (it’s a symbolic link to /dev/null).attempting to start the service with

5 GHz is not for plain people

Nov 27 21:14:04 hostapd[6793]: wlan0: IEEE 802.11 Configured channel (52) not found from the channel list of current mode (2) IEEE 802.11a
Nov 27 21:14:04 hostapd[6793]: wlan0: IEEE 802.11 Hardware does not support configured channel

What do you mean it’s not supported? It’s on the list!

# iw list
[ ... ]
	Band 2:
[ ... ]
          Frequencies:
			* 5180 MHz [36] (17.0 dBm) (no IR)
			* 5200 MHz [40] (17.0 dBm) (no IR)
			* 5220 MHz [44] (17.0 dBm) (no IR)
			* 5240 MHz [48] (17.0 dBm) (no IR)
			* 5260 MHz [52] (24.0 dBm) (no IR, radar detection)
			* 5280 MHz [56] (24.0 dBm) (no IR, radar detection)
			* 5300 MHz [60] (24.0 dBm) (no IR, radar detection)
			* 5320 MHz [64] (24.0 dBm) (no IR, radar detection)
			* 5500 MHz [100] (24.0 dBm) (no IR, radar detection)
			* 5520 MHz [104] (24.0 dBm) (no IR, radar detection)
			* 5540 MHz [108] (24.0 dBm) (no IR, radar detection)
			* 5560 MHz [112] (24.0 dBm) (no IR, radar detection)
			* 5580 MHz [116] (24.0 dBm) (no IR, radar detection)
			* 5600 MHz [120] (24.0 dBm) (no IR, radar detection)
			* 5620 MHz [124] (24.0 dBm) (no IR, radar detection)
			* 5640 MHz [128] (24.0 dBm) (no IR, radar detection)
			* 5660 MHz [132] (24.0 dBm) (no IR, radar detection)
			* 5680 MHz [136] (24.0 dBm) (no IR, radar detection)
			* 5700 MHz [140] (24.0 dBm) (no IR, radar detection)
			* 5720 MHz [144] (24.0 dBm) (no IR, radar detection)
			* 5745 MHz [149] (30.0 dBm) (no IR)
			* 5765 MHz [153] (30.0 dBm) (no IR)
			* 5785 MHz [157] (30.0 dBm) (no IR)
			* 5805 MHz [161] (30.0 dBm) (no IR)
			* 5825 MHz [165] (30.0 dBm) (no IR)
			* 5845 MHz [169] (disabled)
[ ... ]

The problem is evident when executing hostapd with the -dd flag (edit /etc/default/hostapd), in which case it lists the allowed channels. And none of the 5 GHz channels is listed. The underlying reason is the “no IR” part given in “iw list”, meaning no Initial Radiation, hence no access point allowed.

It’s very cute that the driver makes sure I won’t break the regulations, but it so happens that these frequencies are allowed in Israel for indoor use. My computer is indoors.

The way to work around this is to edit one of the driver’s sources, and use it instead.

Preparing for kernel compilation

In theory, I could have compiled the driver only, and replaced the files in the /lib/modules directory. But I’m in for a minimal change, and minimal brain effort. So the technique is to download the entire kernel, compile things that don’t really need compilation. Then pinpoint the correction, and recompile only that.

Unfortunately, Ubuntu’s view on kernel compilation seems to be that it can only be desired for preparing a deb package. After all, who wants to do anything else? So it gets a bit off the regular kernel compilation routine.

OK, so first I had to install some stuff:

# apt install libssl-dev
# apt install libelf-dev

Download the kernel (took me 25 minutes):

$ time git clone git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git

Compilation

The trick is to make the modules of a kernel that is identical to the running one (so there won’t be any bugs due to mismatches) and also match the kernel version string exactly (or the module won’t load).

Check out tag Ubuntu-4.15.0-20.21 (in my case, for 4.15.0-20-generic). This matches the kernel definition at the beginning of dmesg (and also the compilation date).

Follow this post and to prevent the “+” at the end of the kernel version.

Change directory to the kernel tree’s root, and copy the config file:

$ cp /boot/config-`uname -r` .config

Make sure the configuration is in sync:

$ make oldconfig

There will be some output, but no configuration question should be made — if that happens, it’s a sign that the wrong kernel revision has been checked out. In fact,

$ diff /boot/config-`uname -r` .config

should only output the difference in one comment line (the file’s header).

And then run the magic command:

$ fakeroot debian/rules clean

Don’t ask me what it’s for (I took it from this page), but among others, it does

cp debian/scripts/retpoline-extract-one scripts/ubuntu-retpoline-extract-one

and without it one gets the following error:

/bin/sh: ./scripts/ubuntu-retpoline-extract-one: No such file or directory

Ready to go, then. Compile only the modules. The kernel image itself is of no interest:

$ time make KERNELVERSION=`uname -r` -j 12 modules && echo Success

The -j 12 flag means running 12 processes in parallel. Pick your own favorite, depending in the CPU’s core count. Took 13 minutes on my machine.

Alternatively, compile just the relevant subdirectory. Quicker, no reason it shouldn’t work, but this is not how I did it myself:

$ make prepare scripts
$ time make KERNELVERSION=`uname -r` -j 12 M=drivers/net/wireless/ath/ && echo Success

And then use the same command when repeating the compilation below, of course.

Modify the ath.c file

Following this post (more or less) , edit drivers/net/wireless/ath/regd.c and neutralize the following functions with a “return” immediately after variable declarations. Or replace them with functions just returning immediately.

  • ath_reg_apply_beaconing_flags()
  • ath_reg_apply_ir_flags()
  • ath_reg_apply_radar_flags()

Also add a “return 0″ in ath_regd_init_wiphy() just before the call to wiphy_apply_custom_regulatory(), so the three calls to apply-something functions are skipped. In the said post, the entire init function was disabled, but I found that unnecessarily aggressive (and probably breaks something).

Note that there’s e.g. __ath_reg_apply_beaconing_flags() functions. These are not the ones to edit.

And then recompile:

$ make KERNELVERSION=`uname -r` modules && echo Success

This recompiles regd.c and ath.c, and the generates ath.ko. Never mind that the file is huge (2.6 MB) in comparison with the original one (40 kB). Once in the kernel, they occupy the same size.

As root, rename the existing ath.ko in /lib/modules/`uname -r`/kernel/drivers/net/wireless/ath/ to something else, and copy the new one (from drivers/net/wireless/ath/) to the same place.

Unload modules from kernel:

# rmmod ath10k_pci && rmmod ath10k_core && rmmod ath

and reload:

# modprobe ath10k_pci

And check the result (yay):

# iw list
[ ... ]
		Frequencies:
			* 5180 MHz [36] (30.0 dBm)
			* 5200 MHz [40] (30.0 dBm)
			* 5220 MHz [44] (30.0 dBm)
			* 5240 MHz [48] (30.0 dBm)
			* 5260 MHz [52] (30.0 dBm)
			* 5280 MHz [56] (30.0 dBm)
			* 5300 MHz [60] (30.0 dBm)
			* 5320 MHz [64] (30.0 dBm)
			* 5500 MHz [100] (30.0 dBm)
			* 5520 MHz [104] (30.0 dBm)
			* 5540 MHz [108] (30.0 dBm)
			* 5560 MHz [112] (30.0 dBm)
			* 5580 MHz [116] (30.0 dBm)
			* 5600 MHz [120] (30.0 dBm)
			* 5620 MHz [124] (30.0 dBm)
			* 5640 MHz [128] (30.0 dBm)
			* 5660 MHz [132] (30.0 dBm)
			* 5680 MHz [136] (30.0 dBm)
			* 5700 MHz [140] (30.0 dBm)
			* 5720 MHz [144] (30.0 dBm)
			* 5745 MHz [149] (30.0 dBm)
			* 5765 MHz [153] (30.0 dBm)
			* 5785 MHz [157] (30.0 dBm)
			* 5805 MHz [161] (30.0 dBm)
			* 5825 MHz [165] (30.0 dBm)
			* 5845 MHz [169] (30.0 dBm)
[ ... ]

The no-IR marks are gone, and hostapd now happily uses these channels.

Probably not: Upgrading firmware

As I first through that the the problem was an old firmware version, as discussed on this forum post, I went for upgrading it. These are my notes on that. Spoiler: It was probably unnecessary, but I’ll never know, and neither will you.

From the dmesg output:

[   16.152377] ath10k_pci 0000:03:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:03:00.0.bin failed with error -2
[   16.152387] ath10k_pci 0000:03:00.0: Direct firmware load for ath10k/cal-pci-0000:03:00.0.bin failed with error -2
[   16.201636] ath10k_pci 0000:03:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
[   16.201638] ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
[   16.201968] ath10k_pci 0000:03:00.0: firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 fd869beb
[   16.386440] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 20d869c3

I was first mislead to think the firmware wasn’t loaded, but the later lines indicate it was acutally OK.

Listing the firmware files used by the kernel module:

$ modinfo ath10k_pci
filename:       /lib/modules/4.15.0-20-generic/kernel/drivers/net/wireless/ath/ath10k/ath10k_pci.ko
firmware:       ath10k/QCA9377/hw1.0/board.bin
firmware:       ath10k/QCA9377/hw1.0/firmware-5.bin
firmware:       ath10k/QCA6174/hw3.0/board-2.bin
firmware:       ath10k/QCA6174/hw3.0/board.bin
firmware:       ath10k/QCA6174/hw3.0/firmware-6.bin
firmware:       ath10k/QCA6174/hw3.0/firmware-5.bin
firmware:       ath10k/QCA6174/hw3.0/firmware-4.bin
firmware:       ath10k/QCA6174/hw2.1/board-2.bin
firmware:       ath10k/QCA6174/hw2.1/board.bin
firmware:       ath10k/QCA6174/hw2.1/firmware-5.bin
firmware:       ath10k/QCA6174/hw2.1/firmware-4.bin
firmware:       ath10k/QCA9887/hw1.0/board-2.bin
firmware:       ath10k/QCA9887/hw1.0/board.bin
firmware:       ath10k/QCA9887/hw1.0/firmware-5.bin
firmware:       ath10k/QCA988X/hw2.0/board-2.bin
firmware:       ath10k/QCA988X/hw2.0/board.bin
firmware:       ath10k/QCA988X/hw2.0/firmware-5.bin
firmware:       ath10k/QCA988X/hw2.0/firmware-4.bin
firmware:       ath10k/QCA988X/hw2.0/firmware-3.bin
firmware:       ath10k/QCA988X/hw2.0/firmware-2.bin

So which firmware file did it load? Well, there’s a firmware git repo for Atheros 10k:

$ git clone https://github.com/kvalo/ath10k-firmware.git

I’m not very happy running firmware found just somewhere, but the author of this Git repo is Kalle Valo, who works at Qualcomm. The Github account is active since 2010, and the files included in the Linux kernel are included there. So it looks legit.

Comparing files with the ones in the Git repo, which states the full version names, the files loaded were hw3.0/firmware-6.bin and another one (board-2.bin, I guess). The former went into the repo on Decemeber 18, 2017, which is more than a year after the problem in the forum post was solved. My firmware is hence fairly up to date.

Nevertheless, I upgraded to the ones added to the git firmware repo on November 13, 2018, and re-generated initramfs (not that it should matter — using lsinitramfs it’s clear that none of these firmware files are there). Did it help? As expected, no. But hey, now I have the latest and shiniest firmware:

[   16.498706] ath10k_pci 0000:03:00.0: firmware ver WLAN.RM.4.4.1-00124-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 d8fe1bac
[   16.677095] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 506ce037

Exciting! Not.

Better than netstat: lsof tells us who is listening to what

Be sure to read the first comment below, where I’m told netstat can actually do the job. Even though I have to admit that I still find lsof’s output more readable.

OK, so we have netstat to tell us which ports are opened for listening:

$ netstat -n -a | grep "LISTEN "

Thanks, that nice, but what process is listening to these ports? For TCP sockets, it’s (as root):

# lsof -n -P -i tcp 2>/dev/null | grep LISTEN

The -P flag disables conversion from port numbers to protocol names. -n prevents conversion of host names.

Upgrading to Linux Mint 19, running the old system in a chroot

Background

Archaeological findings have revealed that prehistoric humans buried their forefathers under the floor of their huts. Fast forward to 2018, yours truly decided to continue running the (ancient) Fedora 12 as a chroot when migrating to Linux Mint 19. That’s an eight years difference.

While a lot of Linux users are happy to just install the new system and migrate everything “automatically”, this isn’t a good idea if you’re into more than plain tasks. Upgrading is supposed to be smooth, but small changes in the default behavior, API or whatever always make things that worked before fail, and sometimes with significant damage. Of the sort of not receiving emails, backup jobs not really working as before etc. Or just a new bug.

I’ve talked with quite a few sysadmins who were responsible for computers that actually needed to work continuously and reliably, and it’s not long before the apology for their ancient Linux distribution arrived. There’s no need to apologize: Upgrading is not good for keeping the system running smoothly. If it ain’t broke, don’t fix it.

But after some time, the hardware gets old and it becomes difficult to install new software. So I had this idea to keep running the old computer, with all of its properly running services and cronjobs, as a virtual machine. And then I thought, maybe go VPS-style. And then I realized I don’t need the VPS isolation at all. So the idea is to keep the old system as a chroot inside the new one.

Some services (httpd, mail handling, dhcpd) will keep running in the chroot, and others (the desktop in particular, with new shiny GUI programs) running natively. Old and new on the same machine.

The trick is making sure one doesn’t stamp on the feet of the other. These are my insights as I managed to get this up and running.

The basics

The idea is to place the old root filesystem (only) into somewhere in the new system, and chroot into it for the sake of running services and oldschool programs:

  • The old root is placed as e.g. /oldy-root/ in the new filesystem (note that oldy is a legit alternative spelling for oldie…).
  • bind-mounts are used for a unified view of home directories and those containing data.
  • Some services are executed from within the chroot environment. How to run them from Mint 19 (hence using systemd) is described below.
  • Running old programs is also possible by chrooting from shell. This is also discussed below.

Don’t put the old root on a filesystem that contains useful data, because odds are that such file system will be bind-mounted into the chrooted filesystem, which will cause a directory tree loop. Then try to calculate disk space or backup with tar. So pick a separate filesystem (i.e. a separate partition or LVM volume), or possibly as a subdirectory of the same filesystem as the “real” root.

Bind mounting

This is where the tricky choices are made. The point is to make the old and new systems see more or less the same application data, and also allow software to communicate over /tmp. So this is the relevant part in my /etc/fstab:

# Bind mounts for oldy root: system essentials
/dev                        /oldy-root/dev none bind                0       2
/dev/pts                    /oldy-root/dev/pts none bind            0       2
/dev/shm                    /oldy-root/dev/shm none bind            0       2
/sys                        /oldy-root/sys none bind                0       2
/proc                       /oldy-root/proc none bind               0       2

# Bind mounts for oldy root: Storage
/home                       /oldy-root/home none bind               0       2
/storage                    /oldy-root/storage none bind            0       2
/tmp                        /oldy-root/tmp  none bind               0       2
/mnt                        /oldy-root/mnt  none bind               0       2
/media                      /oldy-root/media none bind              0       2

Most notable are /mnt and /media. Bind-mounting these allows temporary mounts to be visible at both sides. /tmp is required for the UNIX domain socket used for playing sound from the old system. And other sockets, I suppose.

Note that /run isn’t bind-mounted. The reason is that the tree structure has changed, so it’s quite pointless (the mounting point used to be /var/run, and the place of the runtime files tend to change with time). The motivation for bind mounting would have been to let software from the old and new software interact, and indeed, there are a few UNIX sockets there, most notably the DBus domain UNIX socket.

But DBus is a good example of how hopeless it is to bind-mount /run: Old software attempting to talk with the Console Kit on the new DBus server fails completely at the protocol level (or namespace? I didn’t really dig into that).

So just copy the old /var/run into the root filesystem and that’s it. CUPS ran smoothly, GUI programs run fairly OK, and sound is done through a UNIX domain socket as suggested in the comments of this post.

I opted out on bind mounting /lib/modules and /usr/src. This makes manipulations of kernel modules (as needed by VMware, for example) impossible from the old system. But gcc is outdated for compiling under the new Linux kernel build system, so there was little point.

/root isn’t bind-mounted either. I wasn’t so sure about that, but in the end, it’s not a very useful directory. Keeping them separate makes the shell history for the root user distinct, and that’s actually a good thing.

Make /dev/log for real

Almost all service programs (and others) send messages to the system log by writing to the UNIX domain socket /dev/log. It’s actually a misnomer, because /dev/log is not a device file. But you don’t break tradition.

WARNING: If the logging server doesn’t work properly, Linux will fail to boot, dropping you into a tiny busybox rescue shell. So before playing with this, reboot to verify all is fine, and then make the changes. Be sure to prepare yourself for reverting your changes with plain command-line utilities (cp, mv, cat) and reboot to make sure all is fine.

In Mint 19 (and forever on), logging is handled by systemd-journald, which is a godsend. However for some reason (does anyone know why? Kindly comment below), the UNIX domain socket it creates is placed at /run/systemd/journal/dev-log, and /dev/log is a symlink to it. There are a few bug reports out there on software refusing to log into a symlink.

But that’s small potatoes: Since I decided not to bind-mount /run, there’s no access to this socket from the old system.

The solution is to swap the two: Make /dev/log the UNIX socket (as it was before), and /run/systemd/journal/dev-log the symlink (I wonder if the latter is necessary). To achieve this, copy /lib/systemd/system/systemd-journald-dev-log.socket into /etc/systemd/system/systemd-journald-dev-log.socket. This will make the latter override the former (keep the file name accurate), and make the change survive possible upgrades — the file in /lib can be overwritten by apt, the one in /etc won’t be by convention.

Edit the file in /etc, in the part saying:

[Socket]
Service=systemd-journald.service
ListenDatagram=/run/systemd/journal/dev-log
Symlinks=/dev/log
SocketMode=0666
PassCredentials=yes
PassSecurity=yes

and swap the files, making it

ListenDatagram=/dev/log
Symlinks=/run/systemd/journal/dev-log

instead.

All in all this works perfectly. Old programs work well (try “logger” command line utility on both sides). This can cause problems if the program expects “the real thing” on /run/systemd/journal/dev-log. Quite unlikely.

As a side note, I had this idea to make journald listen to two UNIX domain sockets: Dropping the Symlinks assignment in the original .socket file, and copying it into a new .socket file, setting ListenDatagram to /dev/log. Two .socket files, two UNIX sockets. Sounded like a good idea, only it failed with an error message saying “Too many /dev/log sockets passed”.

Running old services

systemd’s take on sysV-style services (i.e. those init.d, rcN.d scripts) is that when systemctl is called with reference to a service, it first tries with its native services, and if none is found, it looks for a service of that name in /etc/init.d.

In order to run old services, I wrote a catch-all init.d script, /etc/init.d/oldy-chrooter. It’s intended to be symlinked to, so it tells which service it should run from the command used to call it, then chroots, and executes the script inside the old system. And guess what, systemd plays along with this.

The script follows. Note that it’s written in Perl, but it has the standard info header, which is required on init scripts. String manipulations are easier this way.

#!/usr/bin/perl
### BEGIN INIT INFO
# Required-Start:    $local_fs $remote_fs $syslog
# Required-Stop:     $local_fs $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# X-Interactive:     false
# Short-Description: Oldy root wrapper service
# Description:       Start a service within the oldy root
### END INIT INFO

use warnings;
use strict;

my $targetroot = '/oldy-root';

my ($realcmd) = ($0 =~ /\/oldy-([^\/]+)$/);

die("oldy chroot delegation script called with non-oldy command \"$0\"\n")
  unless (defined $realcmd);

chroot $targetroot or die("Failed to chroot to $targetroot\n");

exec("/etc/init.d/$realcmd", @ARGV) or
  die("Failed to execute \"/etc/init.d/$realcmd\" in oldy chroot\n");

To expose the chroot’s httpd service, make a symlink in init.d:

# cd /etc/init.d/
# ln -s oldy-chrooter oldy-httpd

And then enable with

# systemctl enable oldy-httpd
oldy-httpd.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable oldy-httpd

which indeed runs /lib/systemd/systemd-sysv-install, a shell script, which in turn runs /usr/sbin/update-rc.d with the same arguments. The latter is a Perl script, which analyzes the init.d file, and, among others, parses the INFO header.

The result is the SysV-style generation of S01/K01 symbolic links into /etc/rcN.d. Consequently, it’s possible to start and stop the service as usual. If the service isn’t enabled (or disabled) with systemctl first, attempting to start and stop the service will result in an error message saying the service isn’t found.

It’s a good idea to install the same services on the “main” system and disable them afterwards. There’s no risk for overwriting the old root’s installation, and this allows installation and execution of programs that depend on these services (or they would complain based upon the software package database).

Running programs

Running stuff inside the chroot should be quick and easy. For this reason, I wrote a small C program, which opens a shell within the chroot when called without argument. With one argument, it executes it within the chroot. It can be called by a non-root user, and the same user is applied in the chroot.

This is compiled with

$ gcc oldy.c -o oldy -Wall -O3

and placed /usr/local/bin with setuid root:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>

int main(int argc, char *argv[]) {
  const char jail[] = "/oldy-root/";
  const char newhome[] = "/oldy-root/home/eli/";
  struct passwd *pwd;

  if ((argc!=2) && (argc!=1)){
    printf("Usage: %s [ command ]\n", argv[0]);
    exit(1);
  }

  pwd = getpwuid(getuid());
  if (!pwd) {
    perror("Failed to obtain user name for current user(?!)");
    exit(1);
  }

  // It's necessary to set the ID to 0, or su asks for password despite the
  // root setuid flag of the executable

  if (setuid(0)) {
    perror("Failed to change user");
    exit(1);
  }

  if (chdir(newhome)) {
    perror("Failed to change directory");
    exit(1);
  }

  if (chroot(jail)) {
    perror("Failed to chroot");
    exit(1);
  }

  // oldycmd and oldyshell won't appear, as they're overridden by su

  if (argc == 1)
    execl("/bin/su", "oldyshell", "-", pwd->pw_name, (char *) NULL);
  else
    execl("/bin/su", "oldycmd", "-", pwd->pw_name, "-c", argv[1], (char *) NULL);
  perror("Execution failed");
  exit(1);
}

Notes:

  • Using setuid root is a number one for security holes. I’m not sure I would have this thing on a computer used by strangers.
  • getpwuid() gets the real user ID (not the effective one, as set by setuid), so the call to “su” is made with the original user (even if it’s root, of course). It will fail if that user doesn’t exist.
  • … but note that the user in the chroot system is then one having the same user name as in the original one, not uid. There should be no difference, but watch it if there is (security holes…?)
  • I used “su -” and not just executing bash for the sake of su’s “-” flag, which sets up the environment. Otherwise, it’s a mess.

It’s perfectly OK to run GUI programs with this trick. However it becomes extremely confusing with command line. Is this shell prompt on the old or new system? To fix this, edit /etc/bashrc in the chroot system only to change the prompt. I went for changing the line saying

[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "

to

[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="\[\e[44m\][\u@chroot \W]\[\e[m\]\\$ "

so the “\h” part, which turns into the host’s name now appears as “chroot”. But more importantly, the text background of the shell prompt is changed to blue (as opposed to nothing), so it’s easy to tell where I am.

If you’re into playing with the colors, I warmly recommend looking at this.

Lifting the user processes limit

At some point (it took a few months), I started to have failures of this sort:

$ oldy
oldyshell: /bin/bash: Resource temporarily unavailable

and even worse, some of the chroot-based utilities also failed sporadically.

Checking with ulimit -a, it turned out that the limit for the number of processes owned by my “regular” user was limited to 1024. Checking with ps, I had only about 510 processes belonging to that UID. So it’s not clear why I hit the limit. In the non-chroot environment, the limit is significantly higher.

So edit /etc/security/limits.d/90-nproc.conf (the one inside the jail), changing the line saying

-*          soft    nproc     1024

to

*          soft    nproc     65536

There’s no need for any reboot or anything of that sort, but the already running processes remain within the limit.

Desktop icons and wallpaper messup

This is a seemingly small, but annoying thing: When Nautilus is launched from within the old system, it restores the old wallpaper and sets all icons on the desktop. There are suggestions on how to fix it, but they rely on gsettings, which came after Fedora 12. Haven’t tested this, but is the common suggestion is:

$ gsettings set org.gnome.desktop.background show-desktop-icons false

So for old systems as mine, first, check the current value:

$ gconftool-2 --get /apps/nautilus/preferences/show_desktop

and if it’s “true”, fix it:

$ gconftool-2 --type bool --set /apps/nautilus/preferences/show_desktop false

The settings are stored in ~/.gconf/apps/nautilus/preferences/%gconf.xml.

Setting title in gnome-terminal

So someone thought that the possibility to set the title in the Terminal window, directly from the GUI,  is unnecessary. That happens to be one of the most useful features, if you ask me. I’d really like to know why they dropped that. Or maybe not.

After some wandering around, and reading suggestions on how to do it in various other ways, I went for the old-new solution: Run the old executable in the new system. Namely:

# cd /usr/bin
# mv gnome-terminal new-gnome-terminal
# ln -s /oldy-root/usr/bin/gnome-terminal

It was also necessary to install some library stuff:

# apt install libvte9

But then it complained that it can’t find some terminal.xml file. So

# cd /usr/share/
# ln -s /oldy-root/usr/share/gnome-terminal

And then I needed to set up the keystroke shortcuts (Copy, Paste, New Tab etc.) but that’s really no bother.

Other things to keep in mind

  • Some users and groups must be migrated from the old system to the new manually. I do this always when installing a new computer to make NFS work properly etc, but in this case, some service-related users and groups need to be in sync.
  • Not directly related, but if the IP address of the host changes (which it usually does), set the updated IP address in /etc/sendmail.mc, and recompile. Or get an error saying “opendaemonsocket: daemon MTA: cannot bind: Cannot assign requested address”.

fsck errors after shrinking an unmounted ext4 with resize2fs

Motivation

I’m using resize2fs a lot to when backing up into a USB stick. The procedure is to create an image of an encrypted ext4 file system, and raw write it into the USB flash device. To save time writing to the USB stick the image is shrunk to its minimal size with resize2fs -M.

Uh-oh

This has been working great for years with my oldie resize2fs 1.41.9, but after upgrading my computer (Linux Mint 19), and starting to use 1.44.1, things began to go wrong:

# e2fsck -f /dev/mapper/temporary_18395
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/temporary_18395: 1078201/7815168 files (0.1% non-contiguous), 27434779/31249871 blocks

# resize2fs -M -p /dev/mapper/temporary_18395
resize2fs 1.44.1 (24-Mar-2018)
Resizing the filesystem on /dev/mapper/temporary_18395 to 27999634 (4k) blocks.
Begin pass 2 (max = 1280208)
Relocating blocks             XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 3 (max = 954)
Scanning inode table          XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 4 (max = 89142)
Updating inode references     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/mapper/temporary_18395 is now 27999634 (4k) blocks long.

# e2fsck -f /dev/mapper/temporary_18395
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Inode 85354 extent block passes checks, but checksum does not match extent
	(logical block 237568, physical block 11929600, len 24454)
Fix<y>? yes
Inode 85942 extent block passes checks, but checksum does not match extent
	(logical block 129024, physical block 12890112, len 7954)
Fix<y>? yes
Inode 117693 extent block passes checks, but checksum does not match extent
	(logical block 53248, physical block 391168, len 8310)
Fix<y>? yes
Inode 122577 extent block passes checks, but checksum does not match extent
	(logical block 61440, physical block 399478, len 607)
Fix<y>? yes
Inode 129597 extent block passes checks, but checksum does not match extent
	(logical block 409600, physical block 14016512, len 12918)
Fix<y>? yes
Inode 129599 extent block passes checks, but checksum does not match extent
	(logical block 274432, physical block 13640964, len 1570)
Fix<y>? yes
Inode 129600 extent block passes checks, but checksum does not match extent
	(logical block 120832, physical block 14653440, len 13287)
Fix<y>? yes
Inode 129606 extent block passes checks, but checksum does not match extent
	(logical block 133120, physical block 14870528, len 16556)
Fix<y>? yes
Inode 129613 extent block passes checks, but checksum does not match extent
	(logical block 75776, physical block 15054848, len 23962)
Fix<y>? yes
Inode 129617 extent block passes checks, but checksum does not match extent
	(logical block 284672, physical block 15716352, len 7504)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129622 extent block passes checks, but checksum does not match extent
	(logical block 86016, physical block 15532032, len 18477)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129626 extent block passes checks, but checksum does not match extent
	(logical block 145408, physical block 16967680, len 5536)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129630 extent block passes checks, but checksum does not match extent
	(logical block 165888, physical block 17125376, len 29036)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129677 extent block passes checks, but checksum does not match extent
	(logical block 126976, physical block 17100800, len 24239)
Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/temporary_18395: 1078201/7004160 files (0.1% non-contiguous), 27383882/27999634 blocks

Not the end of the world

This bug has been reported and fixed. Judging by the change made, it was only about the checksums, so while the bug caused fsck to detect (and properly fix) errors, there’s no loss of data (I encountered the same problem when shrinking a 5.7 TB partition by 40 GB — fsck errors, but I checked every single file, a total of ~3 TB, and all was fine).

I beg to differ on the commit message saying it’s a “relatively rare case” as it happened to me every single time in two completely different settings, none of which were special in any way. However we all use journaled filesystems, so fsck checks have become rare, which can explain how this has gone unnoticed: Unless resize2fs officially failed somehow, it leaves the filesystem marked as clean. Only “e2fsck -f ” will reveal the problem.

I would speculate that the reason for this bug is this commit (end of 2014), which speeds up the checksum rewrite after moving an inode. It’s somewhat worrying that a program of this sensitive type isn’t tested properly before being released for everyone’s use.

My own remedy was to compile an updated revision (1.44.4) from the repository, commit ID 75da66777937dc16629e4aea0b436e4cffaa866e. Actually, I first tried to revert to resize2fs 1.41.9, but that one failed shrinking a 128 GB filesystem with only 8 GB left, saying it had run out of space.

Conclusion

It’s almost 2019, the word is that shrinking an ext4 filesystem is dangerous, and guess what, it’s probably a bit true. One could wish it wasn’t, but unfortunately the utilities don’t seem to be maintained with the level of care that one could hope for, given the damage they can make.