Running ktorrent on Linux Mint 19 (Tara), the famous downwards-arrow icon was invisible on the system tray. Which made it appear like the program had quit when it was actually minimized. Clicking the empty box made ktorrent re-appear.
Solution: Invoke the Qt5 configuration tool
$ qt5ct
and under the Appearance tab set “Style” to gtk2 (I believe it was “Fusion” before). It’s not just prettier generally, but after restarting ktorrent, the icon is there.
Actually, it’s probably not about the style, but the fact that qt5ct was run. Because before making the change, the ktorrent printed out the following when launched from the command line:
Mon Dec 24 09:52:55 2018: Qt Warning: QSystemTrayIcon::setVisible: No Icon set
Warning: QSystemTrayIcon::setVisible: No Icon set
Mon Dec 24 09:52:55 2018: Starting minimized
Mon Dec 24 09:52:55 2018: Started update timer
Mon Dec 24 09:52:55 2018: Qt Warning: inotify_add_watch("/home/eli/.config/qt5ct") failed: "No such file or directory"
Warning: inotify_add_watch("/home/eli/.config/qt5ct") failed: "No such file or directory"
The “No Icon set” warning is misleading, because it continued to appear. This is after the fix, with the icon properly in place in the tray:
Mon Dec 24 10:16:17 2018: Qt Warning: QSystemTrayIcon::setVisible: No Icon set
Warning: QSystemTrayIcon::setVisible: No Icon set
Anyhow, problem fixed. For me, that is.
And why ktorrent? Because its last reported vulnerability was in 2009, compared with “Transmission” which had a nasty issue in January 2018. Actually, the exploit in Transmission is interesting by itself, with a clear lesson: If you set up a webserver on the local host for any purpose, assume anyone can access it. Setting it to respond to 127.0.0.1 only doesn’t help.
Introduction
What I wanted: A simple applet on Cinnamon, which allows me to turn a service on and off (hostapd, a Wifi hotspot). I first went for Argos catch-all extension, and learned that Cinnamon isn’t gnome-shell, and in particular that extensions for gnome-shell don’t (necessarily?) work with Cinnamon.
Speaking of which, my system is Linux Mint 19 on an x86_64, with
$ cinnamon --version
Cinnamon 3.8.9
So I went for writing the applet myself. Given the so-so level of difficulty, I should have done that to begin with.
Spoiler: I’m not going to dive into the details of that, because my hostapd-firewall-DHCP daemon setting is quite specific. Rather, I’ll discuss about some general aspects of writing an applet.
So what is it like? Well, quite similar to writing something useful in JavaScript for a web page. Cinnamon’s applets are in fact written in JavaScript, and it feels pretty much the same. In particular, this thing about nothing happening when there’s an error, now go figure what it was. And yes, there’s an error log console which helps with syntax errors (reminds browsers’ error log, discussed below) but often run-time errors just lead to nothing. A situation that is familiar to anyone with JavaScript experience.
And I also finally understand why the cinnamon process hogs CPU all the time. OK, it’s usually just a few percents, and still, what is it doing all that time with no user activity? Answer: Running some JavaScript, I suppose.
But all in all, if you’re good with JavaScript and understand the concepts of GUI programming and events + fairly OK with object oriented programming, it’s quite fun. And there’s another thing you better be good at:
Read The Source
As of December 2018, the API for Cinnamon applets is hardly documented, and it’s somewhat messy. So after reading a couple of tutorials (See “References” at the bottom of this post), the best way to grasp how to get X done is by reading the sources of existing applets:
- System-installed: /usr/share/cinnamon/applets
- User-installed: ~/.local/share/cinnamon/applets
- Cinnamon’s core JavaScript sources: /usr/share/cinnamon/js
Each of these contains several subdirectories, typically with the form name@creator, one for each applet that is available for adding to the panels. Each of these has at least two files, which are also those to supply for your own applet:
- metadata.json, which contains some basic info on the applet (probably used while selecting applets to add).
- applet.js, which contains the JavaScript code for the applet.
It doesn’t matter if they’re executable, even though they often are.
There may also be additional *.js files.
Also, there might also be a po/ directory, which often contains .po and .pot files that are intended for localizing the text displayed to the user. These go along with the _() function in the JavaScript code. For the purposes of a simple applet, these are not necessary. Ignore these _(“Something”) things in the JavaScript code, and read them as just “Something”.
Some applets allow parameter setting. The runtime values for these are at ~/.cinnamon, which contains configuration data etc.
Two ways to object orient
Unfortunately, there are two styles for defining the applet class, both of which are used. This is a matter of minor confusion if you read the code of a few applets, and therefore worthy to note: Some of the applets use JavaScript class declarations (extending a built-in class), e.g.
class CinnamonSoundApplet extends Applet.TextIconApplet {
constructor(metadata, orientation, panel_height, instanceId) {
super(orientation, panel_height, instanceId);
and others use the “prototype” syntax:
MyApplet.prototype = {
__proto__: Applet.IconApplet.prototype,
and so on. I guess they’re equivalent, despite the difference in syntax. Note that in the latter format, the constructor is a function called _init().
This way or another, all classes that employ timeout callbacks should have a destroy() method (no underscore prefix) to cancel them before quitting.
I wasn’t aware of these two syntax possibilities, and therefore started from the first applet I got my hands on. It happened to be written in the “prototype” syntax, which is probably the less preferable choice. I’m therefore not so sure my example below is a good starter.
Getting Started
It’s really about three steps to get an applet up and running.
- Create a directory in ~/.local/share/cinnamon/applets/ and put the two files there: metadata.json and applet.js.
- Restart Cinnamon. No, it’s not as bad as it sounds. See below.
- Install the applet to some panel, just like any other applet.
I warmly suggest copying an existing applet and hacking it. You can start with the skeleton applet I’ve listed below, but there are plenty other available on the web, in particular along with tutorials.
The development cycle (or: how to “run”)
None of the changes made in the applet’s directory (well, almost none) take any effect until Cinnamon is restarted, and when it is, everything is in sync. It’s not like a reboot, and it’s fine to do on the computer you’re working on, really. All windows remain in their workspaces (even though the windows’ tabs at the panel may change order). No reason to avoid this, even if you have a lot of windows opened. Done it a gazillion times.
So how to restart Cinnamon: ALT-F2, type “r” and Enter. Then cringe as your desktop fades away and be overwhelmed when it returns, and nothing bad happened.
If something is wrong with your applet (or otherwise), there a notification saying “Problems during Cinnamon startup” elaborating that “Cinnamon started successfully, but one or more applets, desklets or extensions failed to load”. From my own experience, that’s as bad as it gets: The applet wasn’t loaded, or doesn’t run properly.
Press Win+L (or ALT-F2, then type “lg” and Enter, or type “cinnamon-looking-glass” at shell prompt as non-root user) to launch the Looking Glass tool (called “Melange”). The Log tab is helpful with detailed error messages (colored red, that helps). Alternatively, look for the detailed error message in .xsession-errors in your home directory.
Note that the error message often appears before the line saying that the relevant applet was loaded.
OK, so now to some more specific topics.
Custom icons
Icons are referenced by their file name, without extension, in the JavaScript code as well as the metadata.json file (as “icon” assignment). The search path is the applet’s own icons/ subdirectory and the system icons, present at /usr/share/icons/.
My own experience is that creating an icons/ directory side-by-side with applet.js, and putting a PNG file named wifi-icon-off.png there makes a command like
this.set_applet_icon_name("wifi-icon-off");
work for setting the applet’s main icon on the panel. The PNG’s transparency is honored. The official file format is SVG, but who’s got patience for that.
Same goes with something menu items with icons:
item = new PopupMenu.PopupIconMenuItem("Access point off", "wifi-icon-off", St.IconType.FULLCOLOR);
item.connect('activate', Lang.bind(this, function() {
Main.Util.spawnCommandLine("/usr/local/bin/access-point-ctl off");
}));
this.menu.addMenuItem(item);
My own experience with the menu items is that if the icon file isn’t found, Cinnamon silently puts an empty slot instead. JavaScript-style no fussing.
I didn’t manage to achieve something similar with the “icon” assignment in metadata.json, so the choices are either to save the icon in /usr/share/icons/, or use one of the system icons, or eliminate the “icon” assignment altogether from the JSON file. I went to the last option. This resulted in a dull default icon when installing the applet, but this is of zero importance for an applet I’ve written myself.
Running shell commands from JavaScript
The common way to execute a shell command is e.g.
const Main = imports.ui.main;
Main.Util.spawnCommandLine("gnome-terminal");
The assignment of Main is typically done once, and at the top of the script, of course.
When the output of the command is of interest, it becomes slightly more difficult. The following function implements the parallel of the Perl backtick operator: Run the command, and return the result as a string. Note that unlike its bash counterpart, newlines remain newlines, and are not translated into spaces:
const GLib = imports.gi.GLib;
function backtick(command) {
try {
let [result, stdout, stderr] = GLib.spawn_command_line_sync(command);
if (stdout != null) {
return stdout.toString();
}
}
catch (e) {
global.logError(e);
}
return "";
}
and then one can go e.g.
let output = backtick("/bin/systemctl is-active hostapd");
after which output is a string containing the result of the execution (with a trailing newline, by the way).
As of December 2018, there’s no proper documentation of Cinnamon’s Glib wrapper, however the documentation of the C library can give an idea.
My example applet
OK, so here’s a skeleton applet for getting started with.
Its pros:
- It’s short, quite minimal, and keeps the mumbo-jumbo to a minimum
- It shows a simple drop-down menu display applet, which allows running a different shell command from each entry.
Its cons:
- It’s written in the less-preferable “prototype” syntax for defining objects.
- It does nothing useful. In particular, the shell commands it executes exist only on my computer.
- It depends on a custom icon (see “Custom Icons” above). Maybe this is an advantage…?
So if you want to give it a go, create a directory named ‘wifier@eli’ (or anything else?) in ~/.local/share/cinnamon/applets/, and put this as metadata.json:
{
"description": "Turn Wifi Access Point on and off",
"uuid": "wifier@eli",
"name": "Wifier"
}
And this as applet.js:
const Applet = imports.ui.applet;
const Lang = imports.lang;
const St = imports.gi.St;
const Main = imports.ui.main;
const PopupMenu = imports.ui.popupMenu;
const UUID = 'wifier@eli';
function ConfirmDialog(){
this._init();
}
function MyApplet(orientation, panelHeight, instanceId) {
this._init(orientation, panelHeight, instanceId);
}
MyApplet.prototype = {
__proto__: Applet.IconApplet.prototype,
_init: function(orientation, panelHeight, instanceId) {
Applet.IconApplet.prototype._init.call(this, orientation, panelHeight, instanceId);
try {
this.set_applet_icon_name("wifi-icon-off");
this.set_applet_tooltip("Control Wifi access point");
this.menuManager = new PopupMenu.PopupMenuManager(this);
this.menu = new Applet.AppletPopupMenu(this, orientation);
this.menuManager.addMenu(this.menu);
this._contentSection = new PopupMenu.PopupMenuSection();
this.menu.addMenuItem(this._contentSection);
// First item: Turn on
let item = new PopupMenu.PopupIconMenuItem("Access point on", "wifi-icon-on", St.IconType.FULLCOLOR);
item.connect('activate', Lang.bind(this, function() {
Main.Util.spawnCommandLine("/usr/local/bin/access-point-ctl on");
}));
this.menu.addMenuItem(item);
// Second item: Turn off
item = new PopupMenu.PopupIconMenuItem("Access point off", "wifi-icon-off", St.IconType.FULLCOLOR);
item.connect('activate', Lang.bind(this, function() {
Main.Util.spawnCommandLine("/usr/local/bin/access-point-ctl off");
}));
this.menu.addMenuItem(item);
}
catch (e) {
global.logError(e);
}
},
on_applet_clicked: function(event) {
this.menu.toggle();
},
};
function main(metadata, orientation, panelHeight, instanceId) {
let myApplet = new MyApplet(orientation, panelHeight, instanceId);
return myApplet;
}
Next, create an “icons” subdirectory (e.g. ~/.local/share/cinnamon/applets/wifier@eli/icons/) and put a small (32 x 32 ?) PNG image there as wifi-icon-off.png, which functions as the applet’s top icon. Possibly download mine from here.
Anyhow, be sure to have an icon file. Otherwise there will be nothing on the panel.
Finally, restart Cinnamon, as explained above. You will get errors when trying the menu items (failed execution), but don’t worry — nothing bad will happen.
References
You have been warned
These are my pile of jots as I tried to install Argos “Gnome Shell Extension in seconds” on my Mint 19 Cinnamon machine. As the title implies, it didn’t work out, so I went for writing an applet from scratch, more or less.
Not being strong on Gnome internals, I’m under the impression that it’s simply because Cinnamon isn’t Gnome shell. This post is just the accumulation of notes I took while trying. Nothing to follow step-by-step, as it leads nowhere.
It’s here for the crumbs of info I gathered nevertheless.
Here we go
It says on the project’s Github page that a recent version of Gnome should include Argos. So I went for it:
# apt install gnome-shell-extensions
And since I’m at it:
# apt install gnome-tweaks
Restart Gnome shell: ALT-F2, type “r” and enter. For a second, it looks like a logout, but everything returns to where it was. Don’t hesitate doing this, even if there are a lot of windows opened.
Nada. So I went the manual way. First, found out my Gnome Shell version:
$ apt-cache show gnome-shell | grep Version
or better,
$ gnome-shell --version
GNOME Shell 3.28.3
and downloaded the extension for Gnome shell 3.28 from Gnome’s extension page. Then realized it’s slightly out of date with the git repo, so
$ git clone https://github.com/p-e-w/argos.git
$ cd argos
$ cp -r 'argos@pew.worldwidemann.com' ~/.local/share/cinnamon/extensions/
Note that I copied it into cinnamon’s subdirectory. It’s usually ~/.local/share/gnome-shell/extensions, but not when running Cinnamon!
Restart Gnome shell again: ALT-F2, type “r” and enter.
Then open the “Extensions” GUI thingy from the main menu. Argos extension appears. Select it and press the “+” button to add it.
Restart Gnome shell again. This time a notification appears, saying “Problems during Cinnamon startup” elaborating that “Cinnamon started successfully, but one or more applets, desklets or extensions failed to load”.
Looking at ~/.xsession-errors, I found
Cjs-Message: 12:14:42.822: JS LOG: [LookingGlass/error] [argos@pew.worldwidemann.com]: Missing property "cinnamon-version" in metadata.json
Can’t argue with that, can you? Let’s see:
$ cinnamon --version
Cinnamon 3.8.9
So edit ~/.local/share/cinnamon/extensions/argos@pew.worldwidemann.com/metadata.json and add the line marked in red (not at the end, because the last line doesn’t end with a comma):
"version": 2,
"cinnamon-version": [ "3.6", "3.8", "4.0" ],
"shell-version": ["3.14", "3.16", "3.18", "3.20", "3.22", "3.24", "3.26", "3.28"]
I took this line from some applet I found under ~/.local/share/cinnamon. Not much thought given here.
And guess what? Reset again with ALT-F2 r. Failed again. Now in ~/.xsession-errors:
Cjs-Message: 12:50:32.643: JS LOG: [LookingGlass/error]
[argos@pew.worldwidemann.com]: No JS module 'extensionUtils' found in search path
[argos@pew.worldwidemann.com]: Error importing extension.js from argos@pew.worldwidemann.com
It seems like Cinnamon has changed the extension mechanism altogether, which explains why there’s no extension tab in Gnome Tweaker, and why extensionUtils is missing.
Maybe this explains it. Frankly, I didn’t bother to read that long discussion, but the underlying issue is probably buried there.
There are plenty of web pages describing the different escape codes for changing color of an ANSI emulating terminal. This is in particular useful for giving the shell prompt different colors, to prevent confusion between different computers, for example.
The trick is to set the PS1 bash variable. What is less told, is that \[ and \] tokens must enclose the color escape sequence, or things get completely crazy: Pasting into the terminal creates junk, newlines aren’t interpreted properly and a lot of other peculiarities with the cursor jumping to the wrong place all the time.
So this is wrong (just the color escape sequence, no enclosure):
PS1="\e[44m[\u@here \W]\e[m\\$ "
And this is right:
PS1="\[\e[44m\][\u@here \W]\[\e[m\]\\$ "
Even the “wrong” version about will produce the correct colors, but as mentioned above, other weird stuff happens.
+ how to just compile an Ubuntu distribution kernel without too much messing around.
Introduction
It’s not a real computer installation if the Wifi works out of the box. So these are my notes to self for setting up an access point on a 5 GHz channel. I’ll need it somehow, because each kernel upgrade will require tweaking with the kernel module.
The machine is running Linux Mint 19 (Tara, based upon Ubuntu Bionic, with kernel 4.15.0-20-generic). The NIC is Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter, Vendor/Product IDs 168c:003e.
Installing hostapd
# apt install hostapd
/etc/hostapd/hostapd.conf as follows:
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
#Support older EAPOL authentication (version 1)
eapol_version=1
# Uncomment these for base WPA & WPA2 support with a pre-shared key
wpa=3
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP
wpa_passphrase=mysecret
# Customize these for your local configuration...
interface=wlan0
hw_mode=a
channel=52
ssid=mywifi
country_code=GD
and /etc/default/hostapd as follows:
DAEMON_CONF="/etc/hostapd/hostapd.conf"
DAEMON_OPTS=""
Then unmask the service by deleting /etc/systemd/system/hostapd.service (it’s a symbolic link to /dev/null).attempting to start the service with
5 GHz is not for plain people
Nov 27 21:14:04 hostapd[6793]: wlan0: IEEE 802.11 Configured channel (52) not found from the channel list of current mode (2) IEEE 802.11a
Nov 27 21:14:04 hostapd[6793]: wlan0: IEEE 802.11 Hardware does not support configured channel
What do you mean it’s not supported? It’s on the list!
# iw list
[ ... ]
Band 2:
[ ... ]
Frequencies:
* 5180 MHz [36] (17.0 dBm) (no IR)
* 5200 MHz [40] (17.0 dBm) (no IR)
* 5220 MHz [44] (17.0 dBm) (no IR)
* 5240 MHz [48] (17.0 dBm) (no IR)
* 5260 MHz [52] (24.0 dBm) (no IR, radar detection)
* 5280 MHz [56] (24.0 dBm) (no IR, radar detection)
* 5300 MHz [60] (24.0 dBm) (no IR, radar detection)
* 5320 MHz [64] (24.0 dBm) (no IR, radar detection)
* 5500 MHz [100] (24.0 dBm) (no IR, radar detection)
* 5520 MHz [104] (24.0 dBm) (no IR, radar detection)
* 5540 MHz [108] (24.0 dBm) (no IR, radar detection)
* 5560 MHz [112] (24.0 dBm) (no IR, radar detection)
* 5580 MHz [116] (24.0 dBm) (no IR, radar detection)
* 5600 MHz [120] (24.0 dBm) (no IR, radar detection)
* 5620 MHz [124] (24.0 dBm) (no IR, radar detection)
* 5640 MHz [128] (24.0 dBm) (no IR, radar detection)
* 5660 MHz [132] (24.0 dBm) (no IR, radar detection)
* 5680 MHz [136] (24.0 dBm) (no IR, radar detection)
* 5700 MHz [140] (24.0 dBm) (no IR, radar detection)
* 5720 MHz [144] (24.0 dBm) (no IR, radar detection)
* 5745 MHz [149] (30.0 dBm) (no IR)
* 5765 MHz [153] (30.0 dBm) (no IR)
* 5785 MHz [157] (30.0 dBm) (no IR)
* 5805 MHz [161] (30.0 dBm) (no IR)
* 5825 MHz [165] (30.0 dBm) (no IR)
* 5845 MHz [169] (disabled)
[ ... ]
The problem is evident when executing hostapd with the -dd flag (edit /etc/default/hostapd), in which case it lists the allowed channels. And none of the 5 GHz channels is listed. The underlying reason is the “no IR” part given in “iw list”, meaning no Initial Radiation, hence no access point allowed.
It’s very cute that the driver makes sure I won’t break the regulations, but it so happens that these frequencies are allowed in Israel for indoor use. My computer is indoors.
The way to work around this is to edit one of the driver’s sources, and use it instead.
Note that the typical error message when starting hostapd as a systemd service is quite misleading:
hostapd[735]: wlan0: IEEE 802.11 Configured channel (52) not found from the channel list of current mode (2) IEEE 802.11a
hostapd[735]: wlan0: IEEE 802.11 Hardware does not support configured channel
hostapd[735]: wlan0: IEEE 802.11 Configured channel (52) not found from the channel list of current mode (2) IEEE 802.11a
hostapd[735]: Could not select hw_mode and channel. (-3)
hostapd[735]: wlan0: interface state UNINITIALIZED->DISABLED
hostapd[735]: wlan0: AP-DISABLED
hostapd[735]: wlan0: Unable to setup interface.
hostapd[735]: wlan0: interface state DISABLED->DISABLED
hostapd[735]: wlan0: AP-DISABLED
hostapd[735]: hostapd_free_hapd_data: Interface wlan0 wasn't started
hostapd[735]: nl80211: deinit ifname=wlan0 disabled_11b_rates=0
hostapd[735]: wlan0: IEEE 802.11 Hardware does not support configured channel
[ ... ]
systemd[1]: hostapd.service: Control process exited, code=exited status=1
systemd[1]: hostapd.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
Not only is the message marked in red (as it also appears with journalctl itself) not related to the real reason, which is given a few rows earlier (configured channel not found), but these important log lines don’t appear in the output of “systemctl status hostapd”, as they’re cut out.
Preparing for kernel compilation
In theory, I could have compiled the driver only, and replaced the files in the /lib/modules directory. But I’m in for a minimal change, and minimal brain effort. So the technique is to download the entire kernel, compile things that don’t really need compilation. Then pinpoint the correction, and recompile only that.
Unfortunately, Ubuntu’s view on kernel compilation seems to be that it can only be desired for preparing a deb package. After all, who wants to do anything else? So it gets a bit off the regular kernel compilation routine.
OK, so first I had to install some stuff:
# apt install libssl-dev
# apt install libelf-dev
Download the kernel (took me 25 minutes):
$ time git clone git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git
Compilation
The trick is to make the modules of a kernel that is identical to the running one (so there won’t be any bugs due to mismatches) and also match the kernel version string exactly (or the module won’t load).
Check out tag Ubuntu-4.15.0-20.21 (in my case, for 4.15.0-20-generic). This matches the kernel definition at the beginning of dmesg (and also the compilation date).
Follow this post and to prevent the “+” at the end of the kernel version.
Change directory to the kernel tree’s root, and copy the config file:
$ cp /boot/config-`uname -r` .config
Make sure the configuration is in sync:
$ make oldconfig
There will be some output, but no configuration question should be made — if that happens, it’s a sign that the wrong kernel revision has been checked out. In fact,
$ diff /boot/config-`uname -r` .config
should only output the difference in one comment line (the file’s header).
And then run the magic command:
$ fakeroot debian/rules clean
Don’t ask me what it’s for (I took it from this page), but among others, it does
cp debian/scripts/retpoline-extract-one scripts/ubuntu-retpoline-extract-one
and without it one gets the following error:
/bin/sh: ./scripts/ubuntu-retpoline-extract-one: No such file or directory
Ready to go, then. Compile only the modules. The kernel image itself is of no interest:
$ time make KERNELVERSION=`uname -r` -j 12 modules && echo Success
The -j 12 flag means running 12 processes in parallel. Pick your own favorite, depending in the CPU’s core count. Took 13 minutes on my machine.
Alternatively, compile just the relevant subdirectory. Quicker, no reason it shouldn’t work, but this is not how I did it myself:
$ make prepare scripts
$ time make KERNELVERSION=`uname -r` -j 12 M=drivers/net/wireless/ath/ && echo Success
And then use the same command when repeating the compilation below, of course.
Modify the ath.c file
Following this post (more or less) , edit drivers/net/wireless/ath/regd.c and neutralize the following functions with a “return” immediately after variable declarations. Or replace them with functions just returning immediately.
- ath_reg_apply_beaconing_flags()
- ath_reg_apply_ir_flags()
- ath_reg_apply_radar_flags()
Also add a “return 0″ in ath_regd_init_wiphy() just before the call to wiphy_apply_custom_regulatory(), so the three calls to apply-something functions are skipped. In the said post, the entire init function was disabled, but I found that unnecessarily aggressive (and probably breaks something).
Note that there’s e.g. __ath_reg_apply_beaconing_flags() functions. These are not the ones to edit.
And then recompile:
$ make KERNELVERSION=`uname -r` modules && echo Success
This recompiles regd.c and ath.c, and the generates ath.ko. Never mind that the file is huge (2.6 MB) in comparison with the original one (40 kB). Once in the kernel, they occupy the same size.
As root, rename the existing ath.ko in /lib/modules/`uname -r`/kernel/drivers/net/wireless/ath/ to something else (with a non-ko extension, or it remains in the dependency files), and copy the new one (from drivers/net/wireless/ath/) to the same place.
Unload modules from kernel:
# rmmod ath10k_pci && rmmod ath10k_core && rmmod ath
and reload:
# modprobe ath10k_pci
And check the result (yay):
# iw list
[ ... ]
Frequencies:
* 5180 MHz [36] (30.0 dBm)
* 5200 MHz [40] (30.0 dBm)
* 5220 MHz [44] (30.0 dBm)
* 5240 MHz [48] (30.0 dBm)
* 5260 MHz [52] (30.0 dBm)
* 5280 MHz [56] (30.0 dBm)
* 5300 MHz [60] (30.0 dBm)
* 5320 MHz [64] (30.0 dBm)
* 5500 MHz [100] (30.0 dBm)
* 5520 MHz [104] (30.0 dBm)
* 5540 MHz [108] (30.0 dBm)
* 5560 MHz [112] (30.0 dBm)
* 5580 MHz [116] (30.0 dBm)
* 5600 MHz [120] (30.0 dBm)
* 5620 MHz [124] (30.0 dBm)
* 5640 MHz [128] (30.0 dBm)
* 5660 MHz [132] (30.0 dBm)
* 5680 MHz [136] (30.0 dBm)
* 5700 MHz [140] (30.0 dBm)
* 5720 MHz [144] (30.0 dBm)
* 5745 MHz [149] (30.0 dBm)
* 5765 MHz [153] (30.0 dBm)
* 5785 MHz [157] (30.0 dBm)
* 5805 MHz [161] (30.0 dBm)
* 5825 MHz [165] (30.0 dBm)
* 5845 MHz [169] (30.0 dBm)
[ ... ]
The no-IR marks are gone, and hostapd now happily uses these channels.
Probably not: Upgrading firmware
As I first through that the the problem was an old firmware version, as discussed on this forum post, I went for upgrading it. These are my notes on that. Spoiler: It was probably unnecessary, but I’ll never know, and neither will you.
From the dmesg output:
[ 16.152377] ath10k_pci 0000:03:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:03:00.0.bin failed with error -2
[ 16.152387] ath10k_pci 0000:03:00.0: Direct firmware load for ath10k/cal-pci-0000:03:00.0.bin failed with error -2
[ 16.201636] ath10k_pci 0000:03:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
[ 16.201638] ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
[ 16.201968] ath10k_pci 0000:03:00.0: firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 fd869beb
[ 16.386440] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 20d869c3
I was first mislead to think the firmware wasn’t loaded, but the later lines indicate it was acutally OK.
Listing the firmware files used by the kernel module:
$ modinfo ath10k_pci
filename: /lib/modules/4.15.0-20-generic/kernel/drivers/net/wireless/ath/ath10k/ath10k_pci.ko
firmware: ath10k/QCA9377/hw1.0/board.bin
firmware: ath10k/QCA9377/hw1.0/firmware-5.bin
firmware: ath10k/QCA6174/hw3.0/board-2.bin
firmware: ath10k/QCA6174/hw3.0/board.bin
firmware: ath10k/QCA6174/hw3.0/firmware-6.bin
firmware: ath10k/QCA6174/hw3.0/firmware-5.bin
firmware: ath10k/QCA6174/hw3.0/firmware-4.bin
firmware: ath10k/QCA6174/hw2.1/board-2.bin
firmware: ath10k/QCA6174/hw2.1/board.bin
firmware: ath10k/QCA6174/hw2.1/firmware-5.bin
firmware: ath10k/QCA6174/hw2.1/firmware-4.bin
firmware: ath10k/QCA9887/hw1.0/board-2.bin
firmware: ath10k/QCA9887/hw1.0/board.bin
firmware: ath10k/QCA9887/hw1.0/firmware-5.bin
firmware: ath10k/QCA988X/hw2.0/board-2.bin
firmware: ath10k/QCA988X/hw2.0/board.bin
firmware: ath10k/QCA988X/hw2.0/firmware-5.bin
firmware: ath10k/QCA988X/hw2.0/firmware-4.bin
firmware: ath10k/QCA988X/hw2.0/firmware-3.bin
firmware: ath10k/QCA988X/hw2.0/firmware-2.bin
So which firmware file did it load? Well, there’s a firmware git repo for Atheros 10k:
$ git clone https://github.com/kvalo/ath10k-firmware.git
I’m not very happy running firmware found just somewhere, but the author of this Git repo is Kalle Valo, who works at Qualcomm. The Github account is active since 2010, and the files included in the Linux kernel are included there. So it looks legit.
Comparing files with the ones in the Git repo, which states the full version names, the files loaded were hw3.0/firmware-6.bin and another one (board-2.bin, I guess). The former went into the repo on Decemeber 18, 2017, which is more than a year after the problem in the forum post was solved. My firmware is hence fairly up to date.
Nevertheless, I upgraded to the ones added to the git firmware repo on November 13, 2018, and re-generated initramfs (not that it should matter — using lsinitramfs it’s clear that none of these firmware files are there). Did it help? As expected, no. But hey, now I have the latest and shiniest firmware:
[ 16.498706] ath10k_pci 0000:03:00.0: firmware ver WLAN.RM.4.4.1-00124-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 d8fe1bac
[ 16.677095] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 506ce037
Exciting! Not.
Be sure to read the first comment below, where I’m told netstat can actually do the job. Even though I have to admit that I still find lsof’s output more readable.
OK, so we have netstat to tell us which ports are opened for listening:
$ netstat -n -a | grep "LISTEN "
Thanks, that nice, but what process is listening to these ports? For TCP sockets, it’s (as root):
# lsof -n -P -i tcp 2>/dev/null | grep LISTEN
The -P flag disables conversion from port numbers to protocol names. -n prevents conversion of host names.
Background
Archaeological findings have revealed that prehistoric humans buried their forefathers under the floor of their huts. Fast forward to 2018, yours truly decided to continue running the (ancient) Fedora 12 as a chroot when migrating to Linux Mint 19. That’s an eight years difference.
While a lot of Linux users are happy to just install the new system and migrate everything “automatically”, this isn’t a good idea if you’re into more than plain tasks. Upgrading is supposed to be smooth, but small changes in the default behavior, API or whatever always make things that worked before fail, and sometimes with significant damage. Of the sort of not receiving emails, backup jobs not really working as before etc. Or just a new bug.
I’ve talked with quite a few sysadmins who were responsible for computers that actually needed to work continuously and reliably, and it’s not long before the apology for their ancient Linux distribution arrived. There’s no need to apologize: Upgrading is not good for keeping the system running smoothly. If it ain’t broke, don’t fix it.
But after some time, the hardware gets old and it becomes difficult to install new software. So I had this idea to keep running the old computer, with all of its properly running services and cronjobs, as a virtual machine. And then I thought, maybe go VPS-style. And then I realized I don’t need the VPS isolation at all. So the idea is to keep the old system as a chroot inside the new one.
Some services (httpd, mail handling, dhcpd) will keep running in the chroot, and others (the desktop in particular, with new shiny GUI programs) running natively. Old and new on the same machine.
The trick is making sure one doesn’t stamp on the feet of the other. These are my insights as I managed to get this up and running.
The basics
The idea is to place the old root filesystem (only) into somewhere in the new system, and chroot into it for the sake of running services and oldschool programs:
- The old root is placed as e.g. /oldy-root/ in the new filesystem (note that oldy is a legit alternative spelling for oldie…).
- bind-mounts are used for a unified view of home directories and those containing data.
- Some services are executed from within the chroot environment. How to run them from Mint 19 (hence using systemd) is described below.
- Running old programs is also possible by chrooting from shell. This is also discussed below.
Don’t put the old root on a filesystem that contains useful data, because odds are that such file system will be bind-mounted into the chrooted filesystem, which will cause a directory tree loop. Then try to calculate disk space or backup with tar. So pick a separate filesystem (i.e. a separate partition or LVM volume), or possibly as a subdirectory of the same filesystem as the “real” root.
Bind mounting
This is where the tricky choices are made. The point is to make the old and new systems see more or less the same application data, and also allow software to communicate over /tmp. So this is the relevant part in my /etc/fstab:
# Bind mounts for oldy root: system essentials
/dev /oldy-root/dev none bind 0 2
/dev/pts /oldy-root/dev/pts none bind 0 2
/dev/shm /oldy-root/dev/shm none bind 0 2
/sys /oldy-root/sys none bind 0 2
/proc /oldy-root/proc none bind 0 2
# Bind mounts for oldy root: Storage
/home /oldy-root/home none bind 0 2
/storage /oldy-root/storage none bind 0 2
/tmp /oldy-root/tmp none bind 0 2
/mnt /oldy-root/mnt none bind 0 2
/media /oldy-root/media none bind 0 2
Most notable are /mnt and /media. Bind-mounting these allows temporary mounts to be visible at both sides. /tmp is required for the UNIX domain socket used for playing sound from the old system. And other sockets, I suppose.
Note that /run isn’t bind-mounted. The reason is that the tree structure has changed, so it’s quite pointless (the mounting point used to be /var/run, and the place of the runtime files tend to change with time). The motivation for bind mounting would have been to let software from the old and new software interact, and indeed, there are a few UNIX sockets there, most notably the DBus domain UNIX socket.
But DBus is a good example of how hopeless it is to bind-mount /run: Old software attempting to talk with the Console Kit on the new DBus server fails completely at the protocol level (or namespace? I didn’t really dig into that).
So just copy the old /var/run into the root filesystem and that’s it. CUPS ran smoothly, GUI programs run fairly OK, and sound is done through a UNIX domain socket as suggested in the comments of this post.
I opted out on bind mounting /lib/modules and /usr/src. This makes manipulations of kernel modules (as needed by VMware, for example) impossible from the old system. But gcc is outdated for compiling under the new Linux kernel build system, so there was little point.
/root isn’t bind-mounted either. I wasn’t so sure about that, but in the end, it’s not a very useful directory. Keeping them separate makes the shell history for the root user distinct, and that’s actually a good thing.
Make /dev/log for real
Almost all service programs (and others) send messages to the system log by writing to the UNIX domain socket /dev/log. It’s actually a misnomer, because /dev/log is not a device file. But you don’t break tradition.
WARNING: If the logging server doesn’t work properly, Linux will fail to boot, dropping you into a tiny busybox rescue shell. So before playing with this, reboot to verify all is fine, and then make the changes. Be sure to prepare yourself for reverting your changes with plain command-line utilities (cp, mv, cat) and reboot to make sure all is fine.
In Mint 19 (and forever on), logging is handled by systemd-journald, which is a godsend. However for some reason (does anyone know why? Kindly comment below), the UNIX domain socket it creates is placed at /run/systemd/journal/dev-log, and /dev/log is a symlink to it. There are a few bug reports out there on software refusing to log into a symlink.
But that’s small potatoes: Since I decided not to bind-mount /run, there’s no access to this socket from the old system.
The solution is to swap the two: Make /dev/log the UNIX socket (as it was before), and /run/systemd/journal/dev-log the symlink (I wonder if the latter is necessary). To achieve this, copy /lib/systemd/system/systemd-journald-dev-log.socket into /etc/systemd/system/systemd-journald-dev-log.socket. This will make the latter override the former (keep the file name accurate), and make the change survive possible upgrades — the file in /lib can be overwritten by apt, the one in /etc won’t be by convention.
Edit the file in /etc, in the part saying:
[Socket]
Service=systemd-journald.service
ListenDatagram=/run/systemd/journal/dev-log
Symlinks=/dev/log
SocketMode=0666
PassCredentials=yes
PassSecurity=yes
and swap the files, making it
ListenDatagram=/dev/log
Symlinks=/run/systemd/journal/dev-log
instead.
All in all this works perfectly. Old programs work well (try “logger” command line utility on both sides). This can cause problems if the program expects “the real thing” on /run/systemd/journal/dev-log. Quite unlikely.
As a side note, I had this idea to make journald listen to two UNIX domain sockets: Dropping the Symlinks assignment in the original .socket file, and copying it into a new .socket file, setting ListenDatagram to /dev/log. Two .socket files, two UNIX sockets. Sounded like a good idea, only it failed with an error message saying “Too many /dev/log sockets passed”.
Running old services
systemd’s take on sysV-style services (i.e. those init.d, rcN.d scripts) is that when systemctl is called with reference to a service, it first tries with its native services, and if none is found, it looks for a service of that name in /etc/init.d.
In order to run old services, I wrote a catch-all init.d script, /etc/init.d/oldy-chrooter. It’s intended to be symlinked to, so it tells which service it should run from the command used to call it, then chroots, and executes the script inside the old system. And guess what, systemd plays along with this.
The script follows. Note that it’s written in Perl, but it has the standard info header, which is required on init scripts. String manipulations are easier this way.
#!/usr/bin/perl
### BEGIN INIT INFO
# Required-Start: $local_fs $remote_fs $syslog
# Required-Stop: $local_fs $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# X-Interactive: false
# Short-Description: Oldy root wrapper service
# Description: Start a service within the oldy root
### END INIT INFO
use warnings;
use strict;
my $targetroot = '/oldy-root';
my ($realcmd) = ($0 =~ /\/oldy-([^\/]+)$/);
die("oldy chroot delegation script called with non-oldy command \"$0\"\n")
unless (defined $realcmd);
chroot $targetroot or die("Failed to chroot to $targetroot\n");
exec("/etc/init.d/$realcmd", @ARGV) or
die("Failed to execute \"/etc/init.d/$realcmd\" in oldy chroot\n");
To expose the chroot’s httpd service, make a symlink in init.d:
# cd /etc/init.d/
# ln -s oldy-chrooter oldy-httpd
And then enable with
# systemctl enable oldy-httpd
oldy-httpd.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable oldy-httpd
which indeed runs /lib/systemd/systemd-sysv-install, a shell script, which in turn runs /usr/sbin/update-rc.d with the same arguments. The latter is a Perl script, which analyzes the init.d file, and, among others, parses the INFO header.
The result is the SysV-style generation of S01/K01 symbolic links into /etc/rcN.d. Consequently, it’s possible to start and stop the service as usual. If the service isn’t enabled (or disabled) with systemctl first, attempting to start and stop the service will result in an error message saying the service isn’t found.
It’s a good idea to install the same services on the “main” system and disable them afterwards. There’s no risk for overwriting the old root’s installation, and this allows installation and execution of programs that depend on these services (or they would complain based upon the software package database).
Running programs
Running stuff inside the chroot should be quick and easy. For this reason, I wrote a small C program, which opens a shell within the chroot when called without argument. With one argument, it executes it within the chroot. It can be called by a non-root user, and the same user is applied in the chroot.
This is compiled with
$ gcc oldy.c -o oldy -Wall -O3
and placed /usr/local/bin with setuid root:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>
int main(int argc, char *argv[]) {
const char jail[] = "/oldy-root/";
const char newhome[] = "/oldy-root/home/eli/";
struct passwd *pwd;
if ((argc!=2) && (argc!=1)){
printf("Usage: %s [ command ]\n", argv[0]);
exit(1);
}
pwd = getpwuid(getuid());
if (!pwd) {
perror("Failed to obtain user name for current user(?!)");
exit(1);
}
// It's necessary to set the ID to 0, or su asks for password despite the
// root setuid flag of the executable
if (setuid(0)) {
perror("Failed to change user");
exit(1);
}
if (chdir(newhome)) {
perror("Failed to change directory");
exit(1);
}
if (chroot(jail)) {
perror("Failed to chroot");
exit(1);
}
// oldycmd and oldyshell won't appear, as they're overridden by su
if (argc == 1)
execl("/bin/su", "oldyshell", "-", pwd->pw_name, (char *) NULL);
else
execl("/bin/su", "oldycmd", "-", pwd->pw_name, "-c", argv[1], (char *) NULL);
perror("Execution failed");
exit(1);
}
Notes:
- Using setuid root is a number one for security holes. I’m not sure I would have this thing on a computer used by strangers.
- getpwuid() gets the real user ID (not the effective one, as set by setuid), so the call to “su” is made with the original user (even if it’s root, of course). It will fail if that user doesn’t exist.
- … but note that the user in the chroot system is then one having the same user name as in the original one, not uid. There should be no difference, but watch it if there is (security holes…?)
- I used “su -” and not just executing bash for the sake of su’s “-” flag, which sets up the environment. Otherwise, it’s a mess.
It’s perfectly OK to run GUI programs with this trick. However it becomes extremely confusing with command line. Is this shell prompt on the old or new system? To fix this, edit /etc/bashrc in the chroot system only to change the prompt. I went for changing the line saying
[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "
to
[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="\[\e[44m\][\u@chroot \W]\[\e[m\]\\$ "
so the “\h” part, which turns into the host’s name now appears as “chroot”. But more importantly, the text background of the shell prompt is changed to blue (as opposed to nothing), so it’s easy to tell where I am.
If you’re into playing with the colors, I warmly recommend looking at this.
Lifting the user processes limit
At some point (it took a few months), I started to have failures of this sort:
$ oldy
oldyshell: /bin/bash: Resource temporarily unavailable
and even worse, some of the chroot-based utilities also failed sporadically.
Checking with ulimit -a, it turned out that the limit for the number of processes owned by my “regular” user was limited to 1024. Checking with ps, I had only about 510 processes belonging to that UID. So it’s not clear why I hit the limit. In the non-chroot environment, the limit is significantly higher.
So edit /etc/security/limits.d/90-nproc.conf (the one inside the jail), changing the line saying
-* soft nproc 1024
to
* soft nproc 65536
There’s no need for any reboot or anything of that sort, but the already running processes remain within the limit.
Desktop icons and wallpaper messup
This is a seemingly small, but annoying thing: When Nautilus is launched from within the old system, it restores the old wallpaper and sets all icons on the desktop. There are suggestions on how to fix it, but they rely on gsettings, which came after Fedora 12. Haven’t tested this, but is the common suggestion is:
$ gsettings set org.gnome.desktop.background show-desktop-icons false
So for old systems as mine, first, check the current value:
$ gconftool-2 --get /apps/nautilus/preferences/show_desktop
and if it’s “true”, fix it:
$ gconftool-2 --type bool --set /apps/nautilus/preferences/show_desktop false
The settings are stored in ~/.gconf/apps/nautilus/preferences/%gconf.xml.
Setting title in gnome-terminal
So someone thought that the possibility to set the title in the Terminal window, directly from the GUI, is unnecessary. That happens to be one of the most useful features, if you ask me. I’d really like to know why they dropped that. Or maybe not.
After some wandering around, and reading suggestions on how to do it in various other ways, I went for the old-new solution: Run the old executable in the new system. Namely:
# cd /usr/bin
# mv gnome-terminal new-gnome-terminal
# ln -s /oldy-root/usr/bin/gnome-terminal
It was also necessary to install some library stuff:
# apt install libvte9
But then it complained that it can’t find some terminal.xml file. So
# cd /usr/share/
# ln -s /oldy-root/usr/share/gnome-terminal
And then I needed to set up the keystroke shortcuts (Copy, Paste, New Tab etc.) but that’s really no bother.
Other things to keep in mind
- Some users and groups must be migrated from the old system to the new manually. I do this always when installing a new computer to make NFS work properly etc, but in this case, some service-related users and groups need to be in sync.
- Not directly related, but if the IP address of the host changes (which it usually does), set the updated IP address in /etc/sendmail.mc, and recompile. Or get an error saying “opendaemonsocket: daemon MTA: cannot bind: Cannot assign requested address”.
Motivation
I’m using resize2fs a lot to when backing up into a USB stick. The procedure is to create an image of an encrypted ext4 file system, and raw write it into the USB flash device. To save time writing to the USB stick the image is shrunk to its minimal size with resize2fs -M.
Uh-oh
This has been working great for years with my oldie resize2fs 1.41.9, but after upgrading my computer (Linux Mint 19), and starting to use 1.44.1, things began to go wrong:
# e2fsck -f /dev/mapper/temporary_18395
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/temporary_18395: 1078201/7815168 files (0.1% non-contiguous), 27434779/31249871 blocks
# resize2fs -M -p /dev/mapper/temporary_18395
resize2fs 1.44.1 (24-Mar-2018)
Resizing the filesystem on /dev/mapper/temporary_18395 to 27999634 (4k) blocks.
Begin pass 2 (max = 1280208)
Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 3 (max = 954)
Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 4 (max = 89142)
Updating inode references XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/mapper/temporary_18395 is now 27999634 (4k) blocks long.
# e2fsck -f /dev/mapper/temporary_18395
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Inode 85354 extent block passes checks, but checksum does not match extent
(logical block 237568, physical block 11929600, len 24454)
Fix<y>? yes
Inode 85942 extent block passes checks, but checksum does not match extent
(logical block 129024, physical block 12890112, len 7954)
Fix<y>? yes
Inode 117693 extent block passes checks, but checksum does not match extent
(logical block 53248, physical block 391168, len 8310)
Fix<y>? yes
Inode 122577 extent block passes checks, but checksum does not match extent
(logical block 61440, physical block 399478, len 607)
Fix<y>? yes
Inode 129597 extent block passes checks, but checksum does not match extent
(logical block 409600, physical block 14016512, len 12918)
Fix<y>? yes
Inode 129599 extent block passes checks, but checksum does not match extent
(logical block 274432, physical block 13640964, len 1570)
Fix<y>? yes
Inode 129600 extent block passes checks, but checksum does not match extent
(logical block 120832, physical block 14653440, len 13287)
Fix<y>? yes
Inode 129606 extent block passes checks, but checksum does not match extent
(logical block 133120, physical block 14870528, len 16556)
Fix<y>? yes
Inode 129613 extent block passes checks, but checksum does not match extent
(logical block 75776, physical block 15054848, len 23962)
Fix<y>? yes
Inode 129617 extent block passes checks, but checksum does not match extent
(logical block 284672, physical block 15716352, len 7504)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129622 extent block passes checks, but checksum does not match extent
(logical block 86016, physical block 15532032, len 18477)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129626 extent block passes checks, but checksum does not match extent
(logical block 145408, physical block 16967680, len 5536)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129630 extent block passes checks, but checksum does not match extent
(logical block 165888, physical block 17125376, len 29036)
Fix ('a' enables 'yes' to all) <y>? yes
Inode 129677 extent block passes checks, but checksum does not match extent
(logical block 126976, physical block 17100800, len 24239)
Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/temporary_18395: 1078201/7004160 files (0.1% non-contiguous), 27383882/27999634 blocks
Not the end of the world
This bug has been reported and fixed. Judging by the change made, it was only about the checksums, so while the bug caused fsck to detect (and properly fix) errors, there’s no loss of data (I encountered the same problem when shrinking a 5.7 TB partition by 40 GB — fsck errors, but I checked every single file, a total of ~3 TB, and all was fine).
I beg to differ on the commit message saying it’s a “relatively rare case” as it happened to me every single time in two completely different settings, none of which were special in any way. However we all use journaled filesystems, so fsck checks have become rare, which can explain how this has gone unnoticed: Unless resize2fs officially failed somehow, it leaves the filesystem marked as clean. Only “e2fsck -f ” will reveal the problem.
I would speculate that the reason for this bug is this commit (end of 2014), which speeds up the checksum rewrite after moving an inode. It’s somewhat worrying that a program of this sensitive type isn’t tested properly before being released for everyone’s use.
My own remedy was to compile an updated revision (1.44.4) from the repository, commit ID 75da66777937dc16629e4aea0b436e4cffaa866e. Actually, I first tried to revert to resize2fs 1.41.9, but that one failed shrinking a 128 GB filesystem with only 8 GB left, saying it had run out of space.
Conclusion
It’s almost 2019, the word is that shrinking an ext4 filesystem is dangerous, and guess what, it’s probably a bit true. One could wish it wasn’t, but unfortunately the utilities don’t seem to be maintained with the level of care that one could hope for, given the damage they can make.
Using tar -c –one-file-system a lot for backing up large parts of my disk, I was surprised to note that it went straight into a large part that was bind-mounted into a subdirectory of the part I was backing up.
To put it clear: tar –one-file-system doesn’t (always) detect bind mounts.
Why? Let’s look, for example at the source code (tar version 1.30), src/incremen.c, line 561:
if (one_file_system_option && st->parent
&& stat_data->st_dev != st->parent->stat.st_dev)
{
[ ... ]
}
So tar detects mount points by comparing the ID of the device containing the directory which is a candidate for diving into with its parent’s. This st_dev entry in the stat structure is a 16-bit concatenation of the device’s major and minor numbers, so it’s the underlying physical device (or pseudo device with a zero major for /proc, /sys etc). On a plain “stat filename” at command prompt, this appears as “Device”. For example,
$ stat /
File: `/'
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: fd03h/64771d Inode: 2 Links: 29
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-11-24 15:37:17.072075635 +0200
Modify: 2018-09-17 03:55:01.871469999 +0300
Change: 2018-09-17 03:55:01.871469999 +0300
With “real” mounts, the underlying device is different, so tar detects that correctly. But with a bind mount from the same physical device, tar considers it to be the same filesystem.
Which is, in a way, correct. The bind-mounted part does, after all, belong to the same filesystem, and this is exactly what the –one-file-system promises. It’s only us, lazy humans, who expect –one-file-system not to dive into a mounted directory.
Unrelated, but still
Whatever you do, don’t press CTRL-C while the extracting goes on. If tar quits in the middle, there will be file ownerships and permissions unset, and symlinks set to zero-length files too. It wrecks the entire backup, even in places far away from where tar was working when it was stopped.
Introduction
These are my notes as I attempted to install Linux Mint 19.1 (Tara) on a machine with software RAID, full disk encryption (boot partitions excluded) and LVM. The thing is that the year is 2018, and the old MBR booting method is still available but not a good idea for a system that’s supposed to last. So UEFI it is. And that caused some issues.
For the RAID / encryption part, I had to set up the disks manually, which I’m completely fine with, as I merely repeated something I’ve already done several years ago, and then I thought the installer would get the hint.
But this wasn’t that simple at all. I believe I’ve run the installer some 20 times until I got it right. This reminded of a Windows installation: It’s simple as long as the installation is mainstream. Otherwise, you’re cooked.
And if this post seems a bit long, it’s because I spent two whole days shaving this yak.
Rule #1
This is a bit of a forward reference, but important enough for breaking the order: Whenever manipulating anything related to boot loading, be sure that the machine is already booted in UEFI mode. In particular, when booting from a Live USB stick, the computer might have chosen MBR mode and then the installation will be a mess.
The easiest way to check is with
# efibootmgr
EFI variables are not supported on this system.
If the error message above shows, it’s bad. Re-boot the system, and pick the UEFI boot alternative from the BIOS’ boot menu. If that doesn’t help, look in the kernel log for a reason UEFI isn’t activated. It might be a driver issue (even though it’s not the likely case).
When it’s fine, you’ll get something like this:
# efibootmgr
BootCurrent: 0003
Timeout: 1 seconds
BootOrder: 0000,0003,0004,0002
Boot0000* ubuntu
Boot0002* Hard Drive
Boot0003* UEFI: SanDisk Cruzer Switch 1.27
Boot0004* UEFI: SanDisk Cruzer Switch 1.27, Partition 2
Alternatively, check for the existence of /sys/firmware/efi/. If the “efi” directory is present, it’s most likely fine.
GPT in brief
The GUID partition table is the replacement for the (good old?) MBR-based one. It supports much larger disks, the old head-cylinder-sector terminology is gone forever, and it allows for many more partitions that you’ll ever need. In particular since we’ve got LVM. And instead of those plain numbers for each partition, they are now assigned long GUID identifiers, so there’s more mumbo-jumbo to print out.
GPT is often related to UEFI boot, but I’m not sure there’s any necessary connection. It’s nevertheless a good choice unless you’re a fan of dinosaurs.
UEFI in brief
UEFI / EFI is the boot process which replaces the not-so-good old MBR boot. The old MBR method involved reading a snippet of machine code from the MBR sector and execute it. That little piece of code would then load another chunk of code into memory from some sectors on the disk, and so on. All in all, a tiny bootloader loaded a small bootloader which loaded GRUB or LILO, and that eventually loaded Linux.
Confused with the MBR thingy? That’s because the MBR sector contains the partition information as well as the first stage boot loader. Homework: Can you do MBR boot on GPT? Can you do UEFI on an MBR partition?
Aside from the complicated boot process, this also required keeping track of those hidden sectors, so they won’t be overwritten by files. After all, the boot loader had to sit somewhere, and that was usually on sectors belonging to the main filesystem.
So it was messy.
EFI (and later UEFI) is a simple concept. Let the BIOS read the bootloader from a dedicated EFI partition in FAT format: When the computer is powered up, the BIOS scans this partition (or partitions) for boot binary candidates (files with .efi extension, containing the bootloader’s executable, in specific parts of the hierarchy), and lists them on its boot menu. Note that it may (and probably will) add good old MBR boot possibilities, if such exist, to the menu, even though they have nothing to do with UEFI.
And then the BIOS selects one boot option, possibly after asking the user. In our case, it’s preferably the one belonging to GRUB. Which turns out to be one of /EFI/BOOT/BOOTX64.EFI, /EFI/ubuntu/fwupx64.efi and /EFI/ubuntu/grubx64.efi (don’t ask me why GRUB generates three of them).
A lengthy guide to UEFI can be found here.
UEFI summarized
- The entire boot process is based upon plain files only. No “active boot partition”, no hidden sectors. Easy to backup, restore, even reverting to a previous setting by replacing the file content of two partitions.
- … but there’s now a need for a special EFI boot partition in FAT format.
- The BIOS doesn’t just list devices to boot from, but possibly several boot options from each device.
Two partitions just to boot?
In the good old days, GRUB hid somewhere on the disk, and the kernel / initramfs image could be on the root partition. So one could run Linux on a single partition (swap excluded, if any).
But the EFI partition is of FAT format (preferably FAT32), and then we have a little GRUB convention thing: The kernel and the initramfs image are placed in /boot. The EFI partition is on /boot/efi. So in theory, it’s possible to load the kernel and initramfs from the EFI partition, but the files won’t be where they usually are, and now have fun playing with GRUB’s configuration.
Now, even though it seems possible to have GRUB open both RAID and an encrypted filesystem, I’m not into this level of trickery. Hence /boot can’t be placed on the RAID’s filesystem, as it won’t be visible before the kernel has booted. So /boot has to be in a partition of its own. Actually, this is what is usually done with any software RAID / full disk encryption setting.
This is barely an issue in a RAID setting, because if one disk has a partition for booting purposes, it makes sense allocating the same non-RAID partition on the others. So put the EFI partition on one disk, and /boot on another.
Remember to back up the files in these two partitions. If something goes wrong, just restore the files from the backup tarball. Just don’t forget when recovering, that the EFI partition is FAT.
Finally: Does a partition need to be assigned EFI type to be detected as such? Probably not, but it’s a good idea to set it so.
Installing: The Wrong Way
What I did initially, was to boot from the Live USB stick, set up the RAID and encrypted /dev/md0, and happily click the “Install Ubuntu” icon. Then I went for a “something else” installation, picked the relevant LVM partitions, and kicked it off.
The installation failed with a popup saying “The ‘grub-efi-amd64-signed’ package failed to install into /target/” and then warn me that without the GRUB package the installed system won’t boot (which is sadly correct, but partly: I was thrown into a GRUB shell). Looking into /var/log/syslog, it said on behalf of grub-install: “Cannot find EFI directory.”
This was the case regardless of whether I selected /dev/sda or /dev/sda1 as the device to write bootloader into.
Different attempts to generate an EFI partition and then run the installer failed as well.
Installation (the right way)
Boot the system from a Live USB stick, and verify that you follow Rule #1 above. That is: Check that the “efibootmgr” returns something else than an error.
Then set up RAID + LUKS + LVM as described in this old post of mine. 8 years later, nothing has changed (except for the format of /etc/crypttab, actually). Only the Mint wasn’t as smooth on installing on top of this setting.
The EFI partition should be FAT32, and selected as “use as EFI partition” in the installer’s parted. Set the partition type of /dev/sda1 (only) to EFI (number 1 in GPT) and format it as FAT32. Ubiquity didn’t do this for me, for some reason. So manually:
# mkfs.fat -v -F 32 /dev/sda1
/dev/sdb1 will be used for /boot. /dev/sdc1 remains unused, most likely a place to keep the backups of the two boot related partitions.
So now to the installation itself.
Inspired by this guide, the trick is to skip the installation of the bootloader, and then do it manually. So kick off the RAID with mdadm, open the encrypted partition, verify that the LVM devfiles are in place in /dev/mapper. When opening the encrypted disk, assign the /dev/mapper name that you want to stay with — you’ll have to reboot to fix this later otherwise.
Then use the -b flag in the invocation of ubiquity to run a full installation, just without the bootloader.
# ubiquity -b
Go for a “something else” installation type, select to mount / in the dedicated encrypted LVM partition, and /boot in /dev/sdb1 (or any other non-RAID, non-encrypted partition). Make sure /dev/sda1 is detected an EFI partition, and that it’s intended for EFI boot.
Once it finishes (takes 50 minutes or so, all in all), an “Installation Complete” popup will suggest “Continue Testing” or “Restart Now”. So pick “Continue Testing”. There’s no bootloader yet.
The new operating system will still be mounted as /target. So bind-mount some necessities, and chroot into the new installation:
# for i in /dev /dev/pts /sys /proc /run ; do mount --bind $i /target/$i ; done
# chroot /target
All that follows below is within the new root.
First, mount /boot and /boot/efi with
# mount -a
This should work, as /etc/fstab should have been set up properly during the installation.
Then, (re)install RAID support:
# apt-get install mdadm
It may seem peculiar to install mdadm again, as it was necessary to run exactly the same apt-get command before assembling the RAID in order to get this far. However mdadm isn’t installed on the new system, and without that, there will be no RAID support in the to-be initramfs. Without that, the RAID won’t be assembled on boot, and hence boot will fail.
Set up /etc/crypttab, so it refers to the encrypted partition. Otherwise, there will be no attempt to open it during boot. Find the UUID with
# cryptsetup luksUUID /dev/md0
201b318f-3ffd-47fc-9e00-0356747e3a73
and then /etc/crypttab should say something like
luks-disk UUID=201b318f-3ffd-47fc-9e00-0356747e3a73 none luks
Note that “luks-disk” is just an arbitrary name, which will appear in /dev/mapper. This name should match the one currently found in /dev/mapper, or the inclusion of the crypttab’s info in the new initramfs is likely to fail (with a warning from cryptsetup).
Next, edit /etc/default/grub, making changes as desired (I went for GRUB_TIMEOUT_STYLE to “menu”, to always get a GRUB menu, and also remove “quiet splash” from the kernel command). There is no need for anything related to the use of RAID nor encryption.
Install the GRUB EFI package:
# apt-get install grub-efi-amd64
It might be a good idea to make sure that the initramfs is in sync:
# update-initramfs -u
Then install GRUB:
# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-20-generic
Found initrd image: /boot/initrd.img-4.15.0-20-generic
grub-probe: error: cannot find a GRUB drive for /dev/sdd1. Check your device.map.
Adding boot menu entry for EFI firmware configuration
done
# grub-install
Installing for x86_64-efi platform.
Installation finished. No error reported.
It seems like the apt-get command also led to the execution of the initramfs update and GRUB installation. However I ran these commands nevertheless.
Don’t worry about the error on not finding anything for /dev/sdd1. It’s the USB stick. Indeed, it doesn’t belong.
That’s it. Cross fingers, and reboot. You should be prompted for the passphrase.
Epilogue: How does GRUB executable know where to go next?
Recall that GRUB is packaged as a chunk of code in an .efi file, which is loaded from a dedicated partition. The images are elsewhere. How does it know where to look for them?
So I don’t know exactly how, but it’s clearly fused into the GRUB’s bootloader binary:
# strings -n 8 /boot/efi/EFI/ubuntu/grubx64.efi | tail -2
search.fs_uuid f573c12a-c7e4-41e4-99ef-5fda4a595873 root hd1,gpt1
set prefix=($root)'/grub'
and it so happens that hd1,gpt1 is exactly /dev/sdb1, where the /boot partition is kept, and that the UUID matches the one given as “UUID=” for that partition by the “blkid” utility.
So moving /boot most likely requires reinstalling GRUB. Which isn’t a great surprise. See another post of mine for more about GRUB internals.
Conclusion
It’s a bit unfortunate that in 2018 Linux Mint Ubiquity didn’t manage to land on its feet, and even worse, to warn the user that it’s about to fail colossally. It could even have suggested not to install the bootloader…?
And maybe that’s the way it is: If you want a professional Linux system, better be professional yourself…