Measuring how much RAM a Linux service eats

Introduction

Motivation: I wanted to move a service to another server that is dedicated only to that service. But how much RAM does this new server need? RAM is $$$, so too much is a waste of money, too little means problems.

The method is to run the service and expose it to a scenario that causes it to consume RAM. And then look at the maximal consumption.

This can be done with “top” and similar programs, but these show the current use. I needed the maximal RAM use. Besides, a service may spread out its RAM consumption across several processes. It’s the cumulative consumption that is interesting.

The appealing solution is to use the fact that systemd creates a cgroup for the service. The answer hence lies in the RAM consumption of the cgroup as a whole. It’s also possible to create a dedicated cgroup and run a program within that one, as shown in another post of mine.

This method is somewhat crude, because this memory consumption includes disk cache as well. In other words, this method shows how much RAM is consumed when there’s plenty of memory, and hence when there’s no pressure to reclaim any RAM. Therefore, if the service runs on a server with less RAM (or the service’s RAM consumption is limited in the systemd unit file), it’s more than possible that everything will work just fine. It might run somewhat slower due to disk access that was previously substituted by the cache.

So using a server with as much memory as measured by the test described below (plus some extra for the OS itself) will result in quick execution, but it might be OK to go for less RAM. A tight RAM limit will cause a lot of disk activity at first, and only afterwards will processes be killed by the OOM killer.

Where the information is

All said in this post relates to Linux kernel v4.15. Things are different with later kernels, not necessarily for the better.

There are in principle two versions of the interface with cgroup’s memory management: First, the one I won’t use, which is cgroup-v2 (or maybe this doc for v2 is better?). The sysfs files for this interface for a service named “theservice” reside in /sys/fs/cgroup/unified/system.slice/theservice.service.

I shall be working with the memory control of cgroup-v1. The sysfs files in question are in /sys/fs/cgroup/memory/system.slice/theservice.service/.

If /sys/fs/cgroup/memory/ doesn’t exist, it might be necessary to mount it explicitly. Also, if system.slice doesn’t exist under /sys/fs/cgroup/memory/ it’s most likely because systemd’s memory accounting is not in action. This can be enabled globally, or by setting MemoryAccounting=true on the service’s systemd unit (or maybe any unit?).

Speaking of which, it might be a good idea to set MemoryMax in the service’s systemd unit in order to see what happens when the RAM is really restricted. Or change the limit dynamically, as shown below.

And there’s always the alternative of creating a separate cgroup and running the service in that group. I’ll refer to my own blog post again.

Getting the info

All files mentioned below are in /sys/fs/cgroup/unified/system.slice/theservice.service/ (assuming that the systemd service in question is theservice).

The maximal memory used: memory.max_usage_in_bytes. As it’s name implies this is the maximal amount of RAM used, measured in bytes. This includes disk cache, so the number is higher than what appears in “top”.

The memory currently used: memory.usage_in_bytes.

For more detailed info about memory use: memory.stat. For example:

$ cat memory.stat 
cache 1138688
rss 4268224512
rss_huge 0
shmem 0
mapped_file 516096
dirty 0
writeback 0
pgpgin 36038063
pgpgout 34995738
pgfault 21217095
pgmajfault 176307
inactive_anon 0
active_anon 4268224512
inactive_file 581632
active_file 401408
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 1138688
total_rss 4268224512
total_rss_huge 0
total_shmem 0
total_mapped_file 516096
total_dirty 0
total_writeback 0
total_pgpgin 36038063
total_pgpgout 34995738
total_pgfault 21217095
total_pgmajfault 176307
total_inactive_anon 0
total_active_anon 4268224512
total_inactive_file 581632
total_active_file 401408
total_unevictable 0

Note the “cache” part at the beginning. It’s no coincidence that it’s first. That’s the most important part: How much can be reclaimed just by flushing the cache.

On a 6.1.0 kernel, I’ve seen memory.peak and memory.current instead of memory.max_usage_in_bytes and memory.usage_in_bytes. memory.peak wasn’t writable however (neither in its permissions nor was it possible to write to it), so it wasn’t possible to reset the max level.

Setting memory limits

It’s possible to set memory limits in systemd’s unit file, but it can be more convenient to do this on the fly. In order to set the hard limit of memory use to 40 MiB, go (as root)

# echo 40M > memory.limit_in_bytes

To disable the limit, pick an unreasonably high number, e.g.

# echo 100G > memory.limit_in_bytes

Note that restarting the systemd service has no effect on these parameters (unless a memory limit is required in the unit file). The cgroup directory remains intact.

Resetting between tests

To reset the maximal value that has been recorded for RAM use (as root)

# echo 0 > memory.max_usage_in_bytes

But to really want to start from fresh, all disk cache needs to be cleared as well. The sledge-hammer way is going

# echo 1 > /proc/sys/vm/drop_caches

This frees the page caches system-wide, so everything running on the computer will need to re-read things again from the disk. There’s a slight and temporary global impact on the performance. On a GUI desktop, it gets a bit slow for a while.

A message like this will appear in the kernel log in response:

bash (43262): drop_caches: 1

This is perfectly fine, and indicates no error.

Alternatively, set a low limit for the RAM usage with memory.limit_in_bytes, as shown above. This impacts the cgroup only, forcing a reclaim of disk cache.

Two things that have no effect:

  • Reducing the soft limit (memory.soft_limit_in_bytes). This limit is relevant only when the system is in a shortage of RAM overall. Otherwise, it does nothing.
  • Restarting the service with systemd. It wouldn’t make any sense to flush a disk cache when restarting a service.

It’s of course a good idea to get rid of the disk cache before clearing memory.max_usage_in_bytes, so the max value starts without taking the disk cache into account.

A function similar to Perl’s die() in bash

This is maybe a bit silly, but Perl has a die() function that is really handy for quitting a script with an error message. And I kind of miss it in Bash. So it can be defined with this simple one-liner:

function die { echo $1 ; exit 1 ; }

And then it can be used with something like:

unzip thefile.zip || die "Unzip returned with error status"

The Perl feeling, in Bash.

Altering the Message-ID header in Thunderbird for non-spam detection

TL;DR

In this post, I suggest manipulating the Message IDs of outgoing mails, so that legit inbound replies to my mails are easily detected as non-spam. I also show how to do this with Thunderbird (Linux version 91.10.0, but it works with practically all versions, I believe).

Briefly about Message-ID

Each email should have a Message-ID header, which uniquely identifies this message. The value of this header should consist of a random string, followed by an ‘@’ and a string that represents the domain name (referred to as FQDN, Fully Qualified Domain Name). This is often the full domain name of the “From” header (e.g. gmail.com).

For example, an email generated by Gmail’s web client had Message-ID: <CAD8P7-R2OuJvGiuQ-0RQqgSSmDguwv1VdjHgQND4jMJxPc628w@mail.gmail.com>. A similar result (same FQDN) was obtained when sending from the phone. However, when using Thunderbird to send an email, only “gmail.com” was set as the FQDN.

Does the Message-ID matter?

Like anything related to email, there are a lot of actors, and each has its own quirks. For example, rspamd adds the spam score by 0.5, with the MID_RHS_NOT_FQDN rule, if the Message ID isn’t an FQDN. I’m not sure to which extent it checks that the FQDN matches the email’s From, but even if it does, it can’t be that picky, given the example I showed above in relation to gmail.com.

It’s quite rare that people care about this header. I’ve seen somewhere that someone sending mails from a work computer didn’t like that the name of the internal domain leaking.

All in all, it’s probably a good idea to make sure that the Message-ID header looks legit. Putting the domain from the From header seems to be a good idea to keep spam filters happy.

Why manipulate the Message-ID?

In an reply, the In-Reply-To header gets the value of the Message ID of the message replied to. So if a spam filter can identify that that the email is genuinely a reply to something I sent, it’s definitely not spam. It’s also a good idea to scan the References header too, in order to cover more elaborate scenarios when there are several people corresponding.

The rigorous way to implement this spam filtering feature is storing the Message IDs of all sent mails in some small database, and check for a match with the content of In-Reply-To of arriving mails. Possible, however daunting.

A much easier way is to change the FQDN part, so that it’s easily identifiable. This is unnecessary if you happen send emails with your own domain, as spam senders are very unlikely to add an In-Reply-To with a matching domain (actually, very few spam messages have an In-Reply-To header at all).

But for email sent through gmail, changing the FQDN to something unique is required to make a distinction.

Will this mess up things? I’m not sure any software tries to fully match the FQDN with the sender, but I suppose it’s safe to add a subdomain to the correct domain. I mean, if both “mail.gmail.com” and “gmail.com” are commonly out there, why shouldn’t “secretsauce.gmail.com” seem likewise legit to any spam filter that checks the message?

And by the way, as of August 2024, a DNS query for mail.gmail.com yields no address, neither for A nor MX. In other words, Gmail itself uses an invalid domain in its Message ID, so any other invented subdomain should do as well.

Changing the FQDN on Thunderbird

Click the hamburger icon, choose Preferences, and scroll down all the way (on the General tab) and click on Config Editor.

First, we need to find Thunderbird’s internal ID number for the mail account to manipulate.

To get a list of IDs, write “useremail” in the search text box. This lists entries like mail.identity.id1.useremail and their values. This listing allows making the connection between e.g. “id1″ and the email address related to it.

For example, to change the FQDN of the mail account corresponding to “id3″, add a string property (using the Config Editor). The key of this property is “mail.identity.id3.FQDN” and the value is something like “secretsauce.gmail.com”.

There is no need to restart Thunderbird. The change is in effect on the next mail sent, and it remains in the settings across restarts.

The need for this feature has been questioned, as was discussed here. So if any Thunderbird maintainer reads this, please keep this feature up and running.

A possible alternative approach

Instead of playing around with the Message-ID, it would be possible to add an entry to the References header (or add this header if there is none). The advantage of this way is that this can also be done by the MTA further down the delivery path, and it doesn’t alter anything that is already in place.

And since it’s an added entry, it can also be crafted arbitrarily. For example, it may contain a timestamp (epoch time in hex) and the SHA1 sum of a string that is composed by this timestamp and a secret string. This way, this proof of genuine correspondence is impossible to forge and may expire with time.

I haven’t looked into how to implement this in Thunderbird. Right now I’m good with the Message-ID solution.

Linux kernel workqueues: Is it OK for the worker function to kfree its own work item?

Freeing yourself

Working with Linux kernel’s workqueues, I incremented a kref reference count before queuing a work item, in order to make sure that the data structure that it operated on will still be in memory while it runs. Just before returning, the work item’s function decremented this reference count, and as a result, the data structure’s memory could be freed at that very moment.

The thing was, that this data structure also included the work item’s own struct work_struct. In other words, the work item’s function could potentially free the entry that was pushed into the workqueue on its behalf. Could this possibly be allowed?

The short answer is yes. It’s OK to call kfree() on the memory of the struct work_struct of the currently running work item. No risk for use-after-free (UAF).

It’s also OK to requeue the work item on the same workqueue (or on a different one). All in all, the work item’s struct is just a piece of unused memory as soon as the work item’s function is called.

On the other hand, don’t think about calling destroy_workqueue() on the workqueue on which the running work item is queued: destroy_workqueue() waits for all work items to finish before destroying the queue, which will never happen if the request to destroy the queue came from one of its own work items.

From the horse’s mouth

I didn’t find any documentation on this topic, but there are a couple of comments in the source code, namely in the process_one_work() function in kernel/workqueue.c: First, this one by Tejun Heo from June 2010:

/*
 * It is permissible to free the struct work_struct from
 * inside the function that is called from it, this we need to
 * take into account for lockdep too.  To avoid bogus "held
 * lock freed" warnings as well as problems when looking into
 * work->lockdep_map, make a copy and use that here.
 */

And this comes after calling the work item’s function, worker->current_func(work). Written by Arjan van de Ven in August 2010.

/*
 * While we must be careful to not use "work" after this, the trace
 * point will only record its address.
 */
trace_workqueue_execute_end(work, worker->current_func);

The point of this comment is that the value of @work will be used by the call to trace_workqueue_execute_end(), but it won’t be used as a pointer. This emphasizes the commitment of not touching what @work points at, i.e. the memory segment may have been freed.

How it’s done

process_one_work(), which is the only function that calls the work item’s function, is clearly written in a way that ignores the work item’s struct after calling the work item’s function.

The first thing is that it copies the address of the work function into the worker struct:

worker->current_func = work->func;

It then removes the work item from the workqueue:

list_del_init(&work->entry);

And later on, it calls the function, using the copy of the pointer (even though it could also have used the original at this point).

worker->current_func(work);

After this, the @work variable isn’t used anymore as a pointer.

Installing GRUB 2 manually with rescue-like techniques

Introduction

It’s rarely necessary to make an issue of installing and maintaining the GRUB bootloader. However, for reasons explained in a separate post, I wanted to install GRUB 2.12 on an old distribution (Debian 8). So it required some acrobatics. That said, it doesn’t limit the possibility to install new kernels in the future etc. If you’re ready to edit a simple text file, rather than running automatic tools, that is. Which may actually be a good idea anyhow.

The basics

Grub has two parts: First, there’s the initial code that is loaded by the BIOS, either from the MBR or from the EFI partition. That’s the plain GRUB executable. This executable goes directly to the ext2/3/4 root partition, and reads from /boot/grub/. That directory contains, among others, the precious grub.cfg file, which GRUB reads in order to decide which modules to load, which menu entries to display and how to act if each is selected.

grub.cfg is created by update-grub, which effectively runs “grub-mkconfig -o /boot/grub/grub.cfg”.

This file is created from /etc/grub.d/ and settings from /etc/default/grub, and based upon the kernel image and initrd files that are found in /boot.

Hence an installation of GRUB consists of two tasks, which are fairly independent:

  • Running grub-install so that the MBR or EFI partition are set to run GRUB, and that /boot/grub/ is populated with modules and other stuff. The only important thing is that this utility knows the correct disk to target and where the partition containing /boot/grub is.
  • Running update-grub in order to create (or update) the /boot/grub/grub.cfg file. This is normally done every time the content of /boot is updated (e.g. a new kernel image).

Note that grub-install populates /boot/grub with a lot of files that are used by the bootloader, so it’s necessary to run this command if /boot is wiped and started from fresh.

What made this extra tricky for me, was that Debian 8 comes with an old GRUB 1 version. Therefore, the option of chroot’ing into the filesystem for the purpose of installing GRUB was eliminated.

So there were two tasks to accomplish: Obtaining a suitable grub.cfg and running grub-install in a way that will do the job.

This is a good time to understand what this grub.cfg file is.

The grub.cfg file

grub.cfg is a script, written with a bash-like syntax. and is based upon an internal command set. This is a plain file in /boot/grub/, owned by root:root and writable by root only, for obvious reasons. But for the purpose of booting, permissions don’t make any difference.

Despite the “DO NOT EDIT THIS FILE” comment at the top of this file, and the suggestion to use grub-mkconfig, it’s perfectly OK to edit it for the purposes of updating the behavior of the boot menu. This is unnecessarily complicated in most cases, even when rescuing a system from a Live ISO system: There’s always the possibility to chroot into the target’s root filesystem and call grub-mkconfig from there. That’s usually all that is necessary to update which kernel image / initrd should be kicked off.

That said, it might also be easier to edit this file manually in order to add menu entries for new kernels, for example. In addition, automatic utilities tend to add a lot of specific details that are unnecessary, and that can fail the boot process, for example if the file system’s UUID changes. So maintaining a clean grub.cfg manually can pay off in the long run.

The most interesting part in this file is the menuentry section. Let’s look at a sample command:

menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-a0c2e12e-5d16-4aac-b11d-15cbec5ae98e' {
	recordfail
	load_video
	gfxmode $linux_gfx_mode
	insmod gzio
	if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
	insmod part_gpt
	insmod ext2
	search --no-floppy --fs-uuid --set=root a0c2e12e-5d16-4aac-b11d-15cbec5ae98e
	linux	/boot/vmlinuz-6.8.0-36-generic root=UUID=a0c2e12e-5d16-4aac-b11d-15cbec5ae98e ro
	initrd	/boot/initrd.img-6.8.0-36-generic
}

So these are a bunch of commands that run if the related menu entry is chosen. I’ll discuss “menuentry” and “search” below. Note the “insmod” commands, which load ELF executable modules from /boot/grub/i386-pc/. GRUB also supports lsmod, if you want to try it with GRUB’s interactive command interface.

The menuentry command

The menuentry command is documented here. Let’s break down the command in this example:

  • menuentry: Obviously, the command itself.
  • ‘Ubuntu’: The title, which is the part presented to the user.
  • –class ubuntu –class gnu-linux –class gnu –class os: The purpose of these class flags is to help GRUB group the menu options nicer. Usually redundant.
  • $menuentry_id_option ‘gnulinux-simple-a0c2e12e-5d16-4aac-b11d-15cbec5ae98e’: “$menuentry_id_option” expands into “–id”, so this gives the menu option a unique identifier. It’s useful for submenus, otherwise not required.

Bottom line: If there are no submenus (in the original file there actually are), this header would have done the job as well:

menuentry 'Ubuntu for the lazy' {

The search command

The other interesting part is this row within the menucommand clause:

search --no-floppy --fs-uuid --set=root a0c2e12e-5d16-4aac-b11d-15cbec5ae98e

The search command is documented here. The purpose of this command is to set the $root environment variable, which is what the “–set=root” part means (this is an unnecessary flag, as $root is the target variable anyhow). This tells GRUB in which filesystem to look for the files mentioned in the “linux” and “initrd” commands.

On a system with only one Linux installed, the “search” command is unnecessary: Both $root and $prefix are initialized according to the position of the /boot/grub, so there’s no reason to search for it again.

In this example, the filesystem is defined according to the its UUID , which can be found with this Linux command:

# dumpe2fs /dev/vda2 | grep UUID

It’s better to remove this “search” command if there’s only one /boot directory in the whole system (and it contains the linux kernel files, of course). The advantage is the Linux system can be installed just by pouring all files into an ext4 filesystem (including /boot) and then just run grub-install. Something that won’t work if grub.cfg contains explicit UUIDs. Well, actually, it will work, but with an error message and a prompt to press ENTER: The “search” command fails if the UUID is incorrect, but it wasn’t necessary to begin with, so $root will retain it’s correct value and the system can boot properly anyhow. Given that ENTER is pressed. That hurdle can be annoying on a remote virtual machine.

A sample menuentry command

I added these lines to my grub.cfg file in order to allow future self to try out a new kernel without begin too scared about it:

menuentry 'Unused boot menu entry for future hacks' {
        recordfail
        load_video
        gfxmode $linux_gfx_mode
        insmod gzio
        if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
        insmod part_gpt
        insmod ext2
        linux   /boot/vmlinuz-6.8.12 root=/dev/vda3 ro
}

This is just an implementation of what I said above about the “menuentry” and “search” commands above. In particular, that the “search” command is unnecessary. This worked well on my machine.

As for the other rows, I suggest mixing and matching with whatever appears in your own grub.cfg file in the same places.

Obtaining a grub.cfg file

So the question is: How do I get the initial grub.cfg file? Just take one from a random system? Will that be good enough?

Well, no, that may not work: The grub.cfg is formed differently, depending in particular on how the filesystems on the hard disk are laid out. For example, comparing two grub.cfg files, one had this row:

insmod lvm

and the other didn’t. Obviously, one computer utilized LVM and the other didn’t. Also, in relation to setting the $root variable, there were different variations, going from the “search” method shown above to simply this:

set root='hd0,msdos1'

My solution was to install a Ubuntu 24.04 system on the same KVM virtual machine that I intended to install Debian 8 on later. After the installation, I just copied the grub.cfg and wiped the filesystem. I then installed the required distribution and deleted everything under /boot. Instead, I added this grub.cfg into /boot/grub/ and edited it manually to load the correct kernel.

As I kept the structure of the harddisk and the hardware environment remained unchanged, this worked perfectly fine.

Running grub-install

Truth to be told, I probably didn’t need to use grub-install, since the MBR was already set up with GRUB thanks to the installation I had already carried out for Ubuntu 24.04. Also, I could have copied all other files in /boot/grub from this installation before wiping it. But I didn’t, and it’s a good thing I didn’t, because this way I found out how to do it from a Live ISO. And this might be important for rescue purposes, in the unlikely and very unfortunate event that it’s necessary.

Luckily, grub-install has an undocumented option, –root-directory, which gets the job done.

# grub-install --root-directory=/mnt/new/ /dev/vda
Installing for i386-pc platform.
Installation finished. No error reported.

Note that using –boot-directory isn’t good enough, even if it’s mounted. Only –root-directory makes GRUB detect the correct root directory as the place to fetch the information from. With –boot-directory, the system boots with no menus.

Running update-grub

If you insist on running update-grub, be sure to edit /etc/default/grub and set it this way:

GRUB_TIMEOUT=3
GRUB_RECORDFAIL_TIMEOUT=3

The previous value for GRUB_TIMEOUT is 0, which is supposed to mean to skip the menu. If GRUB deems the boot media not to be writable, it considers every previous boot as a failure (because it can’t know if it was successful or not), and sets the timeout to 30 seconds. 3 seconds are enough, thanks.

And then run update-grub.

# update-grub
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-36-generic
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

Alternatively, edit grub.cfg and fix it directly.

A note about old GRUB 1

This is really not related to anything else above, but since I made an attempt to install Debian 8′s GRUB on the hard disk at some point, this is what happened:

# apt install grub
# grub --version
grub (GNU GRUB 0.97)

# update-grub 
Searching for GRUB installation directory ... found: /boot/grub
Probing devices to guess BIOS drives. This may take a long time.
Searching for default file ... Generating /boot/grub/default file and setting the default boot entry to 0
Searching for GRUB installation directory ... found: /boot/grub
Testing for an existing GRUB menu.lst file ... 

Generating /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz
Found kernel: /boot/vmlinuz-6.8.0-31-generic
Updating /boot/grub/menu.lst ... done

# grub-install /dev/vda
Searching for GRUB installation directory ... found: /boot/grub
The file /boot/grub/stage1 not read correctly.

The error message about /boot/grub/stage1 appears to be horribly misleading. According to this and this, among others, the problem was that the ext4 file system was created with 256 as the inode size, and GRUB 1 doesn’t support that. Which makes sense, as the installation was done on behalf of Ubuntu 24.04 and not a museum distribution.

The solution is apparently to wipe the filesystem correctly:

# mkfs.ext4 -I 128 /dev/vda3

Actually, I don’t know if this was really the problem, because I gave up this old GRUB version quite soon.

Migrating an OpenVZ container to KVM

Introduction

My Debian 8-based web server had been running for several years as an OpenVZ container, when the web host told me that containers are phased out, and it’s time to move on to a KVM.

This is an opportunity to upgrade to a newer distribution, most of you would say, but if a machine works flawlessly for a long period of time, I’m very reluctant to change anything. Don’t touch a stable system. It just happened to have an uptime of 426 days, and the last time this server caused me trouble was way before that.

So the question is if it’s possible to convert a container into a KVM machine, just by copying the filesystem. After all, what’s the difference if /sbin/init (systemd) is kicked off as a plain process inside a container or if the kernel does the same thing?

The answer is yes-ish, this manipulation is possible, but it requires some adjustments.

These are my notes and action items while I found my way to get it done. Everything below is very specific to my own slightly bizarre case, and at times I ended up carrying out tasks in a different order than as listed here. But this can be useful for understanding what’s ahead.

By the way, the wisest thing I did throughout this process, was to go through the whole process on a KVM machine that I built on my own local computer. This virtual machine functioned as a mockup of the server to be installed. Not only did it make the trial and error much easier, but it also allowed me to test all kind of things after the real server was up and running without messing the real machine.

Faking Ubuntu 24.04 LTS

To make things even more interesting, I also wanted to push the next time I’ll be required to mess with the virtual machine as long as possible into the future. Put differently, I wanted to hide the fact that the machine runs on ancient software. There should not be a request to upgrade in the foreseeable future because the old system isn’t compatible with some future version of KVM.

So to the KVM hypervisor, my machine should feel like an Ubuntu 24.04, which was the latest server distribution offered at the time I did this trick. Which brings the question: What does the hypervisor see?

The KVM guest interfaces with its hypervisor in three ways:

  • With GRUB, which accesses the virtual disk.
  • Through the kernel, which interacts with the virtual hardware.
  • Through the guest’s DHCP client, which fetches the IP address, default gateway and DNS from the hypervisor’s dnsmasq.

Or so I hope. Maybe there’s some aspect I’m not aware of. It’s not like I’m such an expert in virtualization.

So the idea was that both GRUB and the kernel should be the same as in Ubuntu 24.04. This way, any KVM setting that works with this distribution will work with my machine. The Naphthalene smell from the user-space software underneath will not reach the hypervisor.

This presumption can turn out to be wrong, and the third item in the list above demonstrates that: The guest machine gets its IP address from the hypervisor through a DHCP request issued by systemd-networkd, which is part of systemd version 215. So the bluff is exposed. Will there be some kind of incompatibility between the old systemd’s DHCP client and some future hypervisor’s response?

Regarding this specific issue, I doubt there will be a problem, as DHCP is such a simple and well-established protocol. And even if that functionality broke, the IP address is fixed anyhow, so the virtual NIC can be configured statically.

But who knows, maybe there is some kind of interaction with systemd that I’m not aware of? Future will tell.

So it boils down to faking GRUB and using a recent kernel.

Solving the GRUB problem

Debian 8 comes with GRUB version 0.97. Could we call that GRUB 1? I can already imagine the answer to my support ticket saying “please upgrade your system, as our KVM hypervisor doesn’t support old versions of GRUB”.

So I need a new one.

Unfortunately, the common way to install GRUB is with a couple of hocus-pocus tools that do the work well in the usual scenario.

As it turns out, there are two parts that need to be installed: The first part consists of the GRUB binary on the boot partition (GRUB partition or EFI, pick your choice), plus several files (modules and other) in /boot/grub/. The second part is a script file, grub.cfg, which is a textual file that can be edited manually.

To make a long story short, I installed the distribution on a virtual machine with the same layout, and made a copy of the grub.cfg file that was created. I then edited this file directly to fit into the new machine. As for installing GRUB binary, I did this from a Live ISO Ubuntu 24.04, so it’s genuine and legit.

For the full and explained story, I’ve written a separate post.

Fitting a decent kernel

This way or another, a kernel and its modules must be added to the filesystem in order to convert it from a container to a KVM machine. This is the essential difference: With a container, one kernel runs all containers and gives them the illusion that they’re the only one. With KVM, the boot starts from the very beginning.

If there was something I didn’t worry about, it was the concept of running an ancient distribution with a very recent kernel. I have a lot of experience with compiling the hot-hot-latest-out kernel and run it with steam engine distributions, and very rarely have I seen any issue with that. The Linux kernel is backward compatible in a remarkable way.

My original idea was to grab the kernel image and the modules from a running installation of Ubuntu 24.04. However, the module format of this distro is incompatible with old Debian 8 (ZST compression seems to have been the crux), and as a result, no modules were loaded.

So I took config-6.8.0-36-generic from Ubuntu 24.04 and used it as the starting point for the .config file used for compiling the vanilla stable kernel with version v6.8.12.

And then there were a few modifications to .config:

  • “make oldconfig” asked a few questions and made some minor modifications, nothing apparently related.
  • Dropped kernel module compression (CONFIG_MODULE_COMPRESS_ZSTD off) and set kernel’s own compression to gzip. This was probably the reason the distribution’s modules didn’t load.
  • Some crypto stuff was disabled: CONFIG_INTEGRITY_PLATFORM_KEYRING, CONFIG_SYSTEM_BLACKLIST_KEYRING and CONFIG_INTEGRITY_MACHINE_KEYRING were dropped, same with CONFIG_LOAD_UEFI_KEYS and most important, CONFIG_SYSTEM_REVOCATION_KEYS was set to “”. Its previous value, “debian/canonical-revoked-certs.pem” made the compilation fail.
  • Dropped CONFIG_DRM_I915, which caused some weird compilation error.
  • After making a test run with the kernel, I also dropped CONFIG_UBSAN with everything that comes with it. UBSAN spat a lot of warning messages on mainstream drivers, and it’s really annoying. It’s still unclear to me why these warnings don’t appear on the distribution kernel. Maybe because a difference between compiler versions (the warnings stem from checks inserted by gcc).

The compilation took 32 minutes on a machine with 12 cores (6 hyperthreaded). By far, the longest and most difficult kernel compilation I can remember for a long time.

Based upon my own post, I created the Debian packages for the whole thing, using the bindeb-pkg make target.

That took additional 20 minutes, running on all cores. I used two of these packages in the installation of the KVM machine, as shown in the cookbook below.

Methodology

So the deal with my web host was like this: They started a KVM machine (with a different IP address, of course). I prepared this KVM machine, and when that was ready, I sent a support ticket asking for swapping the IP addresses. This way, the KVM machine became the new server, and the old container machine went to the junkyard.

As this machine involved a mail server and web sites with user content (comments to my blog, for example), I decided to stop the active server, copy “all data”, and restart the server only after the IP swap. In other words, the net result should be as if the same server had been shut down for an hour, and then restarted. No discontinuities.

As it turned out, everything that is related to the web server and email, including the logs of everything, are in /var/ and /home/. So I could therefore copy all files from the old server to the new one for the sake of setting it up, and verify that everything is smooth as a first stage.

Then I shut down the services and copied /var/ and /home/. And then came the IP swap.

This simple command is handy for checking which files have changed during the past week. The first finds the directories, and the second the plain files.

# find / -xdev -ctime -7 -type d | sort
# find / -xdev -ctime -7 -type f | sort

The purpose of the -xdev flag is to remain on one filesystem. Otherwise, a lot of files from /proc and such are printed out. If your system has several relevant filesystems, be sure to add them to “/” in this example.

The next few sections below are the cookbook I wrote for myself in order to get it done without messing around (and hence mess up).

In hindsight, I can say that except for dealing with GRUB and the kernel, most of the hassle had to with the NIC: Its name changed from venet0 to eth0, and it got its address through DHCP relatively late in the boot process. And that required some adaptations.

Preparing the virtual machine

  • Start the installation Ubuntu 24.04 LTS server edition (or whatever is available, it doesn’t matter much). Possible stop the installation as soon as files are being copied: The only purpose of this step is to partition the disk neatly, so that /dev/vda1 is a small partition for GRUB, and /dev/vda3 is the root filesystem (/dev/vda2 is a swap partition).
  • Start the KVM machine with a rescue image (preferable graphical or with sshd running). I went for Ubuntu 24.04 LTS server Live ISO (the best choice provided by my web host). See notes below on using Ubuntu’s server ISO as a rescue image.
  • Wipe the existing root filesystem, if such has been installed. I considered this necessary at the time, because the default inode size may be 256, and GRUB version 1 won’t play ball with that. But later on I decided on GRUB 2. Anyhow, I forced it to be 128 bytes, despite the warning that 128-byte inodes cannot handle dates beyond 2038 and are deprecated:
    # mkfs.ext4 -I 128 /dev/vda3
  • And since I was at it, no automatic fsck check. Ever. It’s really annoying when you want to kick off the server quickly.
    # tune2fs -c 0 -i 0 /dev/vda3
  • Mount new system as /mnt/new:
    # mkdir /mnt/new
    # mount /dev/vda3 /mnt/new
  • Copy the filesystem. On the OpenVZ machine:
    # tar --one-file-system -cz / | nc -q 0 185.250.251.160 1234 > /dev/null

    and the other side goes (run this before the command above):

    # nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv

    This took about 30 minutes. The purpose of the “-q 0″ flag and those /dev/null redirections is merely to make nc quit when the tar finishes.
    Or, doing the same from a backup tarball:

    $ cat myserver-all-24.07.08-08.22.tar.gz | nc -q 0 -l 1234 > /dev/null

    and the other side goes

    # nc 10.1.1.3 1234 < /dev/null | time tar -C /mnt/new/ -xzv
  • Remove old /lib/modules and boot directory:
    # rm -rf /mnt/new/lib/modules/ /mnt/new/boot/
  • Create /boot/grub and copy the grub.cfg file that I’ve prepared in advance to there. This separate post explains the logic behind doing it this way.
  • Install GRUB on the boot parition (this also adds a lot of files to /boot/grub/):
    # grub-install --root-directory=/mnt/new /dev/vda
  • In order to work inside the chroot, some bind and tmpfs mounts are necessary:
    # mount -o bind /dev /mnt/new/dev
    # mount -o bind /sys /mnt/new/sys
    # mount -t proc /proc /mnt/new/proc
    # mount -t tmpfs tmpfs /mnt/new/tmp
    # mount -t tmpfs tmpfs /mnt/new/run
  • Copy the two .deb files that contain the Linux kernel files to somewhere in /mnt/new/
  • Chroot into the new fs:
    # chroot /mnt/new/
  • Check that /dev, /sys, /proc, /run and /tmp are as expected (mounted correctly).
  • Disable and stop these services: bind9, sendmail, cron.
  • This wins the prize for the oddest fix: Probably in relation to the OpenVZ container, the LSB modules_dep service is active, and it deletes all module files in /lib/modules on reboot. So make sure to never see it again. Just disabling it wasn’t good enough.
    # systemctl mask modules_dep.service
  • Install the Linux kernel and its modules into /boot and /lib/modules:
    # dpkg -i linux-image-6.8.12-myserver_6.8.12-myserver-2_amd64.deb
  • Also install the headers for compilation (why not?)
    # dpkg -i linux-headers-6.8.12-myserver_6.8.12-myserver-2_amd64.deb
  • Add /etc/systemd/network/20-eth0.network
    [Match]
    Name=eth0
    
    [Network]
    DHCP=yes

    The NIC was a given in a container, but now it has to be raised explicitly and the IP address possibly obtained from the hypervisor via DHCP, as I’ve done here.

  • Add the two following lines to /etc/sysctl.conf, in order to turn off IPv6:
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
  • Adjust the firewall rules, so that they don’t depend on the server having a specific IP address (because a temporary IP address will be used).
  • Add support for lspci (better do it now if something goes wrong after booting):
    # apt install pciutils
  • Ban the evbug module, which is intended to generate debug message on input devices. Unfortunately, it floods the kernel log sometimes when the mouse goes over the virtual machine’s console window. So ditch it by adding /etc/modprobe.d/evbug-blacklist.conf having this single line:
    blacklist evbug
  • Edit /etc/fstab. Remove everything, and leave only this row:
    /dev/vda3 / ext4 defaults 0 1
  • Remove persistence udev rules, if such exist, at /etc/udev/rules.d. Oddly enough, there was nothing in this directory, not in the existing OpenVZ server and not in a regular Ubuntu 24.04 server installation.
  • Boot up the system from disk, and perform post-boot fixes as mentioned below.

Post-boot fixes

  • Verify that /tmp is indeed mounted as a tmpfs.
  • Disable (actually, mask) the automount service, which is useless and fails. This makes systemd’s status degraded, which is practically harmless, but confusing.
    # systemctl mask proc-sys-fs-binfmt_misc.automount
  • Install the dbus service:
    # apt install dbus

    Not only is it the right thing to do on a Linux system, but it also silences this warning:

    Cannot add dependency job for unit dbus.socket, ignoring: Unit dbus.socket failed to load: No such file or directory.
  • Enable login prompt on the default visible console (tty1) so that a prompt appears after all the boot messages:
    # systemctl enable getty@tty1.service

    The other tty’s got a login prompt when using Ctrl-Alt-Fn, but not the visible console. So this fixed it. Otherwise, one can be mislead into thinking that the boot process is stuck.

  • Optionally: Disable vzfifo service and remove /.vzfifo.

Just before the IP address swap

  • Reboot the openVZ server to make sure that it wakes up OK.
  • Change the openVZ server’s firewall, so works with a different IP address. Otherwise, it becomes unreachable after the IP swap.
  • Boot the target KVM machine in rescue mode. No need to set up the ssh server as all will be done through VNC.
  • On the KVM machine, mount new system as /mnt/new:
    # mkdir /mnt/new
    # mount /dev/vda3 /mnt/new
  • On the OpenVZ server, check for recently changed directories and files:
    # find / -xdev -ctime -7 -type d | sort > recently-changed-dirs.txt
    # find / -xdev -ctime -7 -type f | sort > recently-changed-files.txt
  • Verify that the changes are only in the places that are going to be updated. If not, consider if and how to update these other files.
  • Verify that the mail queue is empty, or let sendmail empty it if possible. Not a good idea to have something firing off as soon as sendmail resumes:
    # mailq
  • Disable all services except sshd on the OpenVZ server:
    # systemctl disable cron dovecot apache2 bind9 sendmail mysql xinetd
  • Run “mailq” again to verify that the mail queue is empty (unless there was a reason to leave a message there in the previous check).
  • Reboot OpenVZ server and verify that none of these is running. This is the point at which this machine is dismissed as a server, and the downtime clock begins ticking.
  • Verify that this server doesn’t listen to any ports except ssh, as an indication that all services are down:
    # netstat -n -a | less
  • Repeat the check of recently changed files.
  • On KVM machine, remove /var and /home.
  • # rm -rf /mnt/new/var /mnt/new/home
  • Copy these parts:
    On the KVM machine, using the VNC console, go 

    # nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv

    and on myserver:

    # tar --one-file-system -cz /var /home | nc -q 0 185.250.251.160 1234 > /dev/null

    Took 28 minutes.

  • Check that /mnt/new/tmp and /mnt/tmp/run are empty and remove whatever is found, if there’s something there. There’s no reason for anything to be there, and it would be weird if there was, given the way the filesystem was copied from the original machine. But if there are any files, it’s just confusing, as /tmp and /run are tmpfs on the running machine, so any files there will be invisible anyhow.
  • Reboot the KVM machine with a reboot command. It will stop anyhow for removing the CDROM.
  • Remove the KVM’s CDROM and continue the reboot normally.
  • Login to the KVM machine with ssh.
  • Check that all is OK: systemctl status as well as journalctl. Note that the apache, mysql and dovecot should be running now.
  • Power down both virtual machines.
  • Request an IP address swap. Let them do whatever they want with the IPv6 addresses, as they are ignored anyhow.

After IP address swap

  • Start the KVM server normally, and login normally through ssh.
  • Try to browse into the web sites: The web server should already be working properly (even though the DNS is off, but there’s a backup DNS).
  • Check journalctl and systemctl status.
  • Resume the original firewall rules and verify that the firewall works properly:
    # systemctl restart netfilter-persistent
    # iptables -vn -L
  • Start all services, and check status and journalctl again:
    # systemctl start cron dovecot apache2 bind9 sendmail mysql xinetd
  • If all is fine, enable these services:
    # systemctl enable cron dovecot apache2 bind9 sendmail mysql xinetd
  • Reboot (with reboot command), and check that all is fine.
  • In particular, send DNS queries directly to the server with dig, and also send an email to a foreign address (e.g. gmail). My web host blocked outgoing connections to port 25 on the new server, for example.
  • Delete ifcfg-venet0 and ifcfg-venet0:0 in /etc/sysconfig/network-scripts/, as they relate to the venet0 interface that exists only in the container machine. It’s just misleading to have it there.
  • Compare /etc/rc* and /etc/systemd with the situation before the transition in the git repo, to verify that everything is like it should be.
  • Check the server with nmap (run this from another machine):
    $ nmap -v -A server
    $ sudo nmap -v -sU server

And then the DNS didn’t work

I knew very well why I left plenty of time free for after the IP swap. Something will always go wrong after a maneuver like this, and this time was no different. And for some odd reason, it was the bind9 DNS that played two different kinds of pranks.

I noted immediately that the server didn’t answer to DNS queries. As it turned out, there were two apparently independent reasons for it.

The first was that when I re-enabled the bind9 service (after disabling it for the sake of moving), systemctl went for the SYSV scripts instead of its own. So I got:

# systemctl enable bind9
Synchronizing state for bind9.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d bind9 defaults
insserv: warning: current start runlevel(s) (empty) of script `bind9' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `bind9' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d bind9 enable

This could have been harmless and gone unnoticed, had it not been that I’ve added a “-4″ flag to bind9′s command, or else it wouldn’t work. So by running the SYSV scripts, my change in /etc/systemd/system/bind9.service wasn’t in effect.

Solution: Delete all files related to bind9 in /etc/init.d/ and /etc/rc*.d/. Quite aggressive, but did the job.

Having that fixed, it still didn’t work. The problem now was that eth0 was configured through DHCP after the bind9 had begun running. As a result, the DNS didn’t listen to eth0.

I slapped myself for thinking about adding a “sleep” command before launching bind9, and went for the right way to do this. Namely:

$ cat /etc/systemd/system/bind9.service
[Unit]
Description=BIND Domain Name Server
Documentation=man:named(8)
After=network-online.target systemd-networkd-wait-online.service
Wants=network-online.target systemd-networkd-wait-online.service

[Service]
ExecStart=/usr/sbin/named -4 -f -u bind
ExecReload=/usr/sbin/rndc reload
ExecStop=/usr/sbin/rndc stop

[Install]
WantedBy=multi-user.target

The systemd-networkd-wait-online.service is not there by coincidence. Without it, bind9 was launched before eth0 had received an address. With this, systemd consistently waited for the DHCP to finish, and then launched bind9. As it turned out, this also delayed the start of apache2 and sendmail.

If anything, network-online.target is most likely redundant.

And with this fix, the crucial row appeared in the log:

named[379]: listening on IPv4 interface eth0, 193.29.56.92#53

Another solution could have been to assign an address to eth0 statically. For some odd reason, I prefer to let DHCP do this, even though the firewall will block all traffic anyhow if the IP address changes.

Using Live Ubuntu as rescue mode

Set Ubuntu 24.04 server amd64 as the CDROM image.

After the machine has booted, send a Ctrl-Alt-F2 to switch to the second console. Don’t go on with the installation wizard, as it will of course wipe the server.

In order to establish an ssh connection:

  • Choose a password for the default user (ubuntu-server).
    $ passwd

    If you insist on a weak password, remember that you can do that only as root.

  • Use ssh to log in:
    $ ssh ubuntu-server@185.250.251.160

Root login is forbidden (by default), so don’t even try.

Note that even though sshd apparently listens only to IPv6 ports, it’s actually accepting IPv4 connection by virtue of IPv4-mapped IPv6 addresses:

# lsof -n -P -i tcp 2>/dev/null
COMMAND    PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd      1            root  143u  IPv6   5323      0t0  TCP *:22 (LISTEN)
systemd-r  911 systemd-resolve   15u  IPv4   1766      0t0  TCP 127.0.0.53:53 (LISTEN)
systemd-r  911 systemd-resolve   17u  IPv4   1768      0t0  TCP 127.0.0.54:53 (LISTEN)
sshd      1687            root    3u  IPv6   5323      0t0  TCP *:22 (LISTEN)
sshd      1847            root    4u  IPv6  11147      0t0  TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)
sshd      1902   ubuntu-server    4u  IPv6  11147      0t0  TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)One can get the impression that sshd listens only to IPv6. But somehow, it also accepts

So don’t get confused by e.g. netstat and other similar utilities.

To NTP or not?

I wasn’t sure if I should run an NTP client inside a KVM virtual machine. So these are the notes I took.

On a working KVM machine, timesyncd tells about its presence in the log:

Jul 11 20:52:52 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.003s/+0ppm
Jul 11 21:27:00 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.001s/+0ppm
Jul 11 22:01:08 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.002s/0.007s/0.001s/+0ppm
Jul 11 22:35:17 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.001s/+0ppm
Jul 11 23:09:25 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.007s/0.007s/0.003s/+0ppm
Jul 11 23:43:33 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.003s/0.007s/0.005s/+0ppm (ignored)
Jul 12 00:17:41 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.006s/0.007s/0.005s/-1ppm
Jul 12 00:51:50 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.005s/+0ppm
Jul 12 01:25:58 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm
Jul 12 02:00:06 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm
Jul 12 02:34:14 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.005s/+0ppm
Jul 12 03:08:23 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.005s/+0ppm
Jul 12 03:42:31 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.004s/+0ppm
Jul 12 04:17:11 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.003s/+0ppm

So a resync takes place every 2048 seconds (34 minutes and 8 seconds), like a clockwork. As apparent from the values, there’s no dispute about the time between Debian’s NTP server and the web host’s hypervisor.

Running KVM on Linux Mint 19 random jots

General

Exactly like my previous post from 14 years ago, these are random jots that I took as I set up a QEMU/KVM-based virtual machine on my Linux Mint 19 computer. This time, the purpose was to prepare myself for moving a server from an OpenVZ container to KVM.

Other version details, for the record: libvirt version 4.0.0, QEMU version 2.11.1, Virtual Machine manager 1.5.1.

Installation

Install some relevant packages:

# apt install qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients virt-manager virt-viewer ebtables ovmf

This clearly installed a few services: libvirt-bin, libvirtd, libvirt-guest, virtlogd, qemu-kvm, ebtables, and a couple of sockets: virtlockd.socket and virtlogd.socket with their attached services.

My regular username on the computer was added automatically to the “libvirt” group, however that doesn’t take effect until one logs out and and in again. Without belonging to this group, one gets the error message “Unable to connect to libvirt qemu:///system” when attempting to run the Virtual Machine Manager. Or in more detail: “libvirtError: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock’: Permission denied”.

The lazy and temporary solution is to run the Virtual Machine Manager with “sg”. So instead of the usual command for starting the GUI tool (NOT as root):

$ virt-manager &

Use “sg” (or start a session with the “newgroup” command):

$ sg libvirt virt-manager &

This is necessary only until next time you log in to the console. I think. I didn’t get that far. Who logs out?

There’s also a command-line utility, virsh. For example, to list all running machines:

$ sudo virsh list

Or just “sudo virsh” for an interactive shell.

Note that without root permissions, the list is simply empty. This is really misleading.

General notes

  • Virtual machines are called “domains” in several contexts (within virsh in particular).
  • To get the mouse out of the graphical window, use Ctrl-Alt.
  • For networking to work, some rules related to virbr0 are automatically added to the iptables firewall. If these are absent, go “systemctl restart libvirtd” (don’t do this with virtual machines running, of course).
  • These iptables rules are important in particular for WAN connections. Apparently, these allow virbr0 to make DNS queries to the local machine (adding rules to INPUT and OUTPUT chains). In addition, the FORWARD rule allows forwarding anything to and from virbr0 (as long as the correct address mask is matched). Plus a whole lot off stuff around POSTROUTING. Quite disgusting, actually.
  • There are two Ethernet interfaces related to KVM virtualization: vnet0 and virbr0 (typically). For sniffing, virbr0 is a better choice, as it’s the virtual machine’s own bridge to the system, so there is less noise. This is also the interface that has an IP address of its own.
  • A vnetN pops up for each virtual machine that is running, virbr0 is there regardless.
  • The configuration files are kept as fairly readable XML files in /etc/libvirt/qemu
  • The images are typically held at /var/lib/libvirt/images, owned by root with 0600 permissions.
  • The libvirtd service runs /usr/sbin/libvirtd as well as two processes of /usr/sbin/dnsmasq. When a virtual machine runs, it also runs an instance of qemu-system-x86_64 on its behalf.

Creating a new virtual machine

Start the Virtual Manager. The GUI is good enough for my purposes.

$ sg libvirt virt-manager &
  • Click on the “Create new virtual machine” and choose “Local install media”. Set the other parameters as necessary.
  • As for storage, choose “Select or create custom storage” and create a qcow2 volume in a convenient position on the disk (/var/lib/libvirt/images is hardly a good place for that, as it’s on the root partition).
  • In the last step, choose “customize configuration before install”.
  • Network selection: Virtual nework ‘default’: NAT.
  • Change the NIC, Disk and Video to VirtIO as mentioned below.
  • Click “Begin Installation”.

Do it with VirtIO

That is, use Linux’ paravirtualization drivers, rather than emulation of hardware.

To set up a machine’s settings, go View > Details.

This is lspci’s response with a default virtual machine:

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20)
00:04.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)
00:05.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:05.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:05.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:05.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:06.0 Communication controller: Red Hat, Inc Virtio console
00:07.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon

Cute, but all interfaces are emulations of real hardware. In other words, this will run really slowly.

Testing link speed: On the host machine:

$ nc -l 1234 < /dev/null > /dev/null

And on the guest:

$ dd if=/dev/zero bs=128k count=4k | nc -q 0 10.1.1.3 1234
4096+0 records in
4096+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 3.74558 s, 143 MB/s

Quite impressive for hardware emulation, I must admit. But it can get better.

Things to change from the default settings:

  • NIC: Choose “virtio” as device model, keep “Virtual network ‘default’” as NAT.
  • Disk: On “Disk bus”, don’t use IDE, but rather “VirtIO” (it will appear as /dev/vda etc.).
  • Video: Don’t use QXL, but Virtio (without 3D acceleration, it wasn’t supported on my machine). Actually, I’m not so sure about this one. For example, Ubuntu’s installation live boot gave me a black screen occasionally with Virtio.

Note that it’s possible to use a VNC server instead of “Display spice”.

After making these changes:

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Red Hat, Inc Virtio GPU (rev 01)
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
00:04.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)
00:05.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:05.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:05.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:05.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:06.0 Communication controller: Red Hat, Inc Virtio console
00:07.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
00:08.0 SCSI storage controller: Red Hat, Inc Virtio block device

Try the speed test again?

$ dd if=/dev/zero bs=128k count=4k | nc -q 0 10.1.1.3 1234
4096+0 records in
4096+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.426422 s, 1.3 GB/s

Almost ten times faster.

Preparing a live Ubuntu ISO for ssh

$ sudo su
# apt install openssh-server
# passwd ubuntu

In the installation of the openssh-server, there’s a question of which configuration files to use. Choose the package maintainer’s version.

Bracketed paste: Those ~0 and ~1 added around pasted text

Intro

This is a super-short post, but I have a feeling it will evolve with time.

Using ssh, pasting text with CTRL-V or the mouse’s middle button sometimes resulted in ~0 and ~1 around the pasted text. Super annoying.

As it turns out, this is called “bracketed paste” and its a way for the terminal application (say, Gnome terminal) to tell the receiver (say, bash) that the text is pasted, and not typed manually.

Why is this helpful? For example, if the text goes to an editor which responds with automatic indentation as a result of a newline, that can have a negative effect. Bracketed paste gives the editor to possibility to accept the text as is, assuming that it’s already correctly indented, since it’s pasted and not typed.

The reason for the ~0 and ~1 problem is probably that the bash version on the ssh’ed computer is really old, and my Gnome terminal is relatively new. So bash doesn’t understand the magic characters, and prints them out as they are.

This problem will probably go away by itself sooner or later.

Magic solution

There are all kinds of “bind” commands, but for some reason, I thought this solution was coolest.

Turning off bracketed paste, which adds ~0 and ~1 around the pasted text:

$ printf "\e[?2004l"

To re-enable bracketed paste:

$ printf "\e[?2004h"

These two commands were taken from this page. The effect of these commands seems to go beyond what one would expect. It seems like they don’t influence just the current terminal session, but I need to figure this out.

Add \subsubsubsection to a Hitec Latex document

So what if you need to divide a \subsubsection{} into even lower subsections? LaTeX classes don’t usually support that, because if you need that feature, your document’s structure is wrong. Or so they say. You should have chopped the document with \part{} or \chapter{} at a higher level, and not cut down the sections into even smaller pieces.

But with technical documentation (say, outlining an API) it can be very handy with something below \subsubsection{}. As it turns out, LaTeX actually supports lower levels, but they aren’t numbered by default. So it goes:

  1. \section{}
  2. \subsection{}
  3. \subsubsection{}
  4. \paragraph{}
  5. \subparagraph{}

That’s neat, isn’t it? In order to make the two last numbered, add this to the LaTeX document:

\setcounter{secnumdepth}{5}
\setcounter{tocdepth}{5}
\titleformat{\paragraph}
{\normalfont\normalsize\bfseries}{\theparagraph}{1em}{}
\titlespacing*{\paragraph}
{-15ex}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
\titleformat{\subparagraph}
{\normalfont\normalsize\bfseries}{\thesubparagraph}{1em}{}
\titlespacing*{\subparagraph}
{-12ex}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}

After adding this, sub-sub-sub-section numbers appear with \paragraph{}, and even one more level down with \subparagraph{}.

\label{} works as expected (\ref{} correctly references \paragraph{} and \subparagraph{}), and the table of contents also lists these elements neatly.

This snippet works well with the Hitec class. I don’t know if it works with other classes. But even if it does, odds are that the result will look ugly, as this code defines the spacing so that it looks fairly nice with Hitec’s formatting.

So it’s not really \subsubsubsection{}, which is awkwardly long anyhow, but a more elegant solution.

Perl script for mangling SRT subtitle files

I had a set of SRT files with pretty good subtitles, but with one annoying problem: When there was a song in the background, the translation of the song would pop up and interrupt of the dialogue’s subtitles, so it became impossible to understand what’s going on.

Luckily, those song-translating subtitles had all have a “{\a6}” string, which is an ASS tag meaning that the text should be shown at the top of the picture. mplayer ignores these tags, which explains why these subtitles make sense, but mess up things for me. So the simple solution is to remove these entries.

Why don’t I use VLC instead? Mainly because I’m used to mplayer, and I’m under the impression that mplayer gives much better and easier control of low-level issues such as adjusting the subtitles’ timing. But also the ability to run it with a lot of parameters from the command line and jumping back and forth in the displayed video, in particular through a keyboard remote control. But maybe it’s just a matter of habit.

Here’s a Perl script that reads an SRT file and removes all entries with such string. It fixes the numbering of the entries to make up for those that have been removed. Fun fact: The entries don’t need to appear in chronological order. In fact, most of the annoying subtitles appeared at the end of the file, even though they messed up things everywhere.

This can be a boilerplate for other needs as well, of course.

#!/usr/bin/perl
use warnings;
use strict;

my $fname = shift;

my $data = readfile($fname);

my ($name, $ext) = ($fname =~ /^(.*)\.(.*)$/);

die("No extension in file name \"$fname\"\n")
  unless (defined $name);

# Regex for a newline, swallowing surrounding CR if such exist
my $nl = qr/\r*\n\r*/;

# Regex for a subtitle entry
my $tregex = qr/(?:\d+$nl.*?(?:$nl$nl|$))/s;

my ($pre, $chunk, $post) = ($data =~ /^(.*?)($tregex*)(.*)$/);

die("Input file doesn't look like an SRT file\n")
  unless (defined $chunk);

my $lpre = length($pre);
my $lpost = length($post);

print "Warning: Passing through $lpre bytes at beginning of file untouched\n"
 if ($lpre);

print "Warning: Passing through $lpost bytes at beginning of file untouched\n"
 if ($lpost);

my @items = ($chunk =~ /($tregex)/g);

#### This is the mangling part

my @outitems;
my $removed = 0;
my $counter = 1;

foreach my $i (@items) {
  if ($i =~ /\\a6/) {
    $removed++;
  } else {
    $i =~ s/\d+/$counter/;
    $counter++;
    push @outitems, $i;
  }
}

print "Removed $removed subtitle entries from $fname\n";

#### Mangling part ends here

writefile("$name-clean.$ext", join("", $pre, @outitems, $post));

exit(0); # Just to have this explicit

############ Simple file I/O subroutines ############

sub writefile {
  my ($fname, $data) = @_;

  open(my $out, ">:utf8", $fname)
    or die "Can't open \"$fname\" for write: $!\n";
  print $out $data;
  close $out;
}

sub readfile {
  my ($fname) = @_;

  local $/; # Slurp mode

  open(my $in, "<:utf8", $fname)
    or die "Can't open $fname for read: $!\n";

  my $input = <$in>;
  close $in;

  return $input;
}