Migrating an OpenVZ container to KVM
Posted Under: Linux,Linux kernel,Server admin,systemd,Virtualization
Introduction
My Debian 8-based web server had been running for several years as an OpenVZ container, when the web host told me that containers are phased out, and it’s time to move on to a KVM.
This is an opportunity to upgrade to a newer distribution, most of you would say, but if a machine works flawlessly for a long period of time, I’m very reluctant to change anything. Don’t touch a stable system. It just happened to have an uptime of 426 days, and the last time this server caused me trouble was way before that.
So the question is if it’s possible to convert a container into a KVM machine, just by copying the filesystem. After all, what’s the difference if /sbin/init (systemd) is kicked off as a plain process inside a container or if the kernel does the same thing?
The answer is yes-ish, this manipulation is possible, but it requires some adjustments.
These are my notes and action items while I found my way to get it done. Everything below is very specific to my own slightly bizarre case, and at times I ended up carrying out tasks in a different order than as listed here. But this can be useful for understanding what’s ahead.
By the way, the wisest thing I did throughout this process, was to go through the whole process on a KVM machine that I built on my own local computer. This virtual machine functioned as a mockup of the server to be installed. Not only did it make the trial and error much easier, but it also allowed me to test all kind of things after the real server was up and running without messing the real machine.
Faking Ubuntu 24.04 LTS
To make things even more interesting, I also wanted to push the next time I’ll be required to mess with the virtual machine as long as possible into the future. Put differently, I wanted to hide the fact that the machine runs on ancient software. There should not be a request to upgrade in the foreseeable future because the old system isn’t compatible with some future version of KVM.
So to the KVM hypervisor, my machine should feel like an Ubuntu 24.04, which was the latest server distribution offered at the time I did this trick. Which brings the question: What does the hypervisor see?
The KVM guest interfaces with its hypervisor in three ways:
- With GRUB, which accesses the virtual disk.
- Through the kernel, which interacts with the virtual hardware.
- Through the guest’s DHCP client, which fetches the IP address, default gateway and DNS from the hypervisor’s dnsmasq.
Or so I hope. Maybe there’s some aspect I’m not aware of. It’s not like I’m such an expert in virtualization.
So the idea was that both GRUB and the kernel should be the same as in Ubuntu 24.04. This way, any KVM setting that works with this distribution will work with my machine. The Naphthalene smell from the user-space software underneath will not reach the hypervisor.
This presumption can turn out to be wrong, and the third item in the list above demonstrates that: The guest machine gets its IP address from the hypervisor through a DHCP request issued by systemd-networkd, which is part of systemd version 215. So the bluff is exposed. Will there be some kind of incompatibility between the old systemd’s DHCP client and some future hypervisor’s response?
Regarding this specific issue, I doubt there will be a problem, as DHCP is such a simple and well-established protocol. And even if that functionality broke, the IP address is fixed anyhow, so the virtual NIC can be configured statically.
But who knows, maybe there is some kind of interaction with systemd that I’m not aware of? Future will tell.
So it boils down to faking GRUB and using a recent kernel.
Solving the GRUB problem
Debian 8 comes with GRUB version 0.97. Could we call that GRUB 1? I can already imagine the answer to my support ticket saying “please upgrade your system, as our KVM hypervisor doesn’t support old versions of GRUB”.
So I need a new one.
Unfortunately, the common way to install GRUB is with a couple of hocus-pocus tools that do the work well in the usual scenario.
As it turns out, there are two parts that need to be installed: The first part consists of the GRUB binary on the boot partition (GRUB partition or EFI, pick your choice), plus several files (modules and other) in /boot/grub/. The second part is a script file, grub.cfg, which is a textual file that can be edited manually.
To make a long story short, I installed the distribution on a virtual machine with the same layout, and made a copy of the grub.cfg file that was created. I then edited this file directly to fit into the new machine. As for installing GRUB binary, I did this from a Live ISO Ubuntu 24.04, so it’s genuine and legit.
For the full and explained story, I’ve written a separate post.
Fitting a decent kernel
This way or another, a kernel and its modules must be added to the filesystem in order to convert it from a container to a KVM machine. This is the essential difference: With a container, one kernel runs all containers and gives them the illusion that they’re the only one. With KVM, the boot starts from the very beginning.
If there was something I didn’t worry about, it was the concept of running an ancient distribution with a very recent kernel. I have a lot of experience with compiling the hot-hot-latest-out kernel and run it with steam engine distributions, and very rarely have I seen any issue with that. The Linux kernel is backward compatible in a remarkable way.
My original idea was to grab the kernel image and the modules from a running installation of Ubuntu 24.04. However, the module format of this distro is incompatible with old Debian 8 (ZST compression seems to have been the crux), and as a result, no modules were loaded.
So I took config-6.8.0-36-generic from Ubuntu 24.04 and used it as the starting point for the .config file used for compiling the vanilla stable kernel with version v6.8.12.
And then there were a few modifications to .config:
- “make oldconfig” asked a few questions and made some minor modifications, nothing apparently related.
- Dropped kernel module compression (CONFIG_MODULE_COMPRESS_ZSTD off) and set kernel’s own compression to gzip. This was probably the reason the distribution’s modules didn’t load.
- Some crypto stuff was disabled: CONFIG_INTEGRITY_PLATFORM_KEYRING, CONFIG_SYSTEM_BLACKLIST_KEYRING and CONFIG_INTEGRITY_MACHINE_KEYRING were dropped, same with CONFIG_LOAD_UEFI_KEYS and most important, CONFIG_SYSTEM_REVOCATION_KEYS was set to “”. Its previous value, “debian/canonical-revoked-certs.pem” made the compilation fail.
- Dropped CONFIG_DRM_I915, which caused some weird compilation error.
- After making a test run with the kernel, I also dropped CONFIG_UBSAN with everything that comes with it. UBSAN spat a lot of warning messages on mainstream drivers, and it’s really annoying. It’s still unclear to me why these warnings don’t appear on the distribution kernel. Maybe because a difference between compiler versions (the warnings stem from checks inserted by gcc).
The compilation took 32 minutes on a machine with 12 cores (6 hyperthreaded). By far, the longest and most difficult kernel compilation I can remember for a long time.
Based upon my own post, I created the Debian packages for the whole thing, using the bindeb-pkg make target.
That took additional 20 minutes, running on all cores. I used two of these packages in the installation of the KVM machine, as shown in the cookbook below.
Methodology
So the deal with my web host was like this: They started a KVM machine (with a different IP address, of course). I prepared this KVM machine, and when that was ready, I sent a support ticket asking for swapping the IP addresses. This way, the KVM machine became the new server, and the old container machine went to the junkyard.
As this machine involved a mail server and web sites with user content (comments to my blog, for example), I decided to stop the active server, copy “all data”, and restart the server only after the IP swap. In other words, the net result should be as if the same server had been shut down for an hour, and then restarted. No discontinuities.
As it turned out, everything that is related to the web server and email, including the logs of everything, are in /var/ and /home/. So I could therefore copy all files from the old server to the new one for the sake of setting it up, and verify that everything is smooth as a first stage.
Then I shut down the services and copied /var/ and /home/. And then came the IP swap.
This simple command is handy for checking which files have changed during the past week. The first finds the directories, and the second the plain files.
# find / -xdev -ctime -7 -type d | sort # find / -xdev -ctime -7 -type f | sort
The purpose of the -xdev flag is to remain on one filesystem. Otherwise, a lot of files from /proc and such are printed out. If your system has several relevant filesystems, be sure to add them to “/” in this example.
The next few sections below are the cookbook I wrote for myself in order to get it done without messing around (and hence mess up).
In hindsight, I can say that except for dealing with GRUB and the kernel, most of the hassle had to with the NIC: Its name changed from venet0 to eth0, and it got its address through DHCP relatively late in the boot process. And that required some adaptations.
Preparing the virtual machine
- Start the installation Ubuntu 24.04 LTS server edition (or whatever is available, it doesn’t matter much). Possible stop the installation as soon as files are being copied: The only purpose of this step is to partition the disk neatly, so that /dev/vda1 is a small partition for GRUB, and /dev/vda3 is the root filesystem (/dev/vda2 is a swap partition).
- Start the KVM machine with a rescue image (preferable graphical or with sshd running). I went for Ubuntu 24.04 LTS server Live ISO (the best choice provided by my web host). See notes below on using Ubuntu’s server ISO as a rescue image.
- Wipe the existing root filesystem, if such has been installed. I considered this necessary at the time, because the default inode size may be 256, and GRUB version 1 won’t play ball with that. But later on I decided on GRUB 2. Anyhow, I forced it to be 128 bytes, despite the warning that 128-byte inodes cannot handle dates beyond 2038 and are deprecated:
# mkfs.ext4 -I 128 /dev/vda3
- And since I was at it, no automatic fsck check. Ever. It’s really annoying when you want to kick off the server quickly.
# tune2fs -c 0 -i 0 /dev/vda3
- Mount new system as /mnt/new:
# mkdir /mnt/new # mount /dev/vda3 /mnt/new
- Copy the filesystem. On the OpenVZ machine:
# tar --one-file-system -cz / | nc -q 0 185.250.251.160 1234 > /dev/null
and the other side goes (run this before the command above):
# nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv
This took about 30 minutes. The purpose of the “-q 0″ flag and those /dev/null redirections is merely to make nc quit when the tar finishes.
Or, doing the same from a backup tarball:$ cat myserver-all-24.07.08-08.22.tar.gz | nc -q 0 -l 1234 > /dev/null
and the other side goes
# nc 10.1.1.3 1234 < /dev/null | time tar -C /mnt/new/ -xzv
- Remove old /lib/modules and boot directory:
# rm -rf /mnt/new/lib/modules/ /mnt/new/boot/
- Create /boot/grub and copy the grub.cfg file that I’ve prepared in advance to there. This separate post explains the logic behind doing it this way.
- Install GRUB on the boot parition (this also adds a lot of files to /boot/grub/):
# grub-install --root-directory=/mnt/new /dev/vda
- In order to work inside the chroot, some bind and tmpfs mounts are necessary:
# mount -o bind /dev /mnt/new/dev # mount -o bind /sys /mnt/new/sys # mount -t proc /proc /mnt/new/proc # mount -t tmpfs tmpfs /mnt/new/tmp # mount -t tmpfs tmpfs /mnt/new/run
- Copy the two .deb files that contain the Linux kernel files to somewhere in /mnt/new/
- Chroot into the new fs:
# chroot /mnt/new/
- Check that /dev, /sys, /proc, /run and /tmp are as expected (mounted correctly).
- Disable and stop these services: bind9, sendmail, cron.
- This wins the prize for the oddest fix: Probably in relation to the OpenVZ container, the LSB modules_dep service is active, and it deletes all module files in /lib/modules on reboot. So make sure to never see it again. Just disabling it wasn’t good enough.
# systemctl mask modules_dep.service
- Install the Linux kernel and its modules into /boot and /lib/modules:
# dpkg -i linux-image-6.8.12-myserver_6.8.12-myserver-2_amd64.deb
- Also install the headers for compilation (why not?)
# dpkg -i linux-headers-6.8.12-myserver_6.8.12-myserver-2_amd64.deb
- Add /etc/systemd/network/20-eth0.network
[Match] Name=eth0 [Network] DHCP=yes
The NIC was a given in a container, but now it has to be raised explicitly and the IP address possibly obtained from the hypervisor via DHCP, as I’ve done here.
- Add the two following lines to /etc/sysctl.conf, in order to turn off IPv6:
net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1
- Adjust the firewall rules, so that they don’t depend on the server having a specific IP address (because a temporary IP address will be used).
- Add support for lspci (better do it now if something goes wrong after booting):
# apt install pciutils
- Ban the evbug module, which is intended to generate debug message on input devices. Unfortunately, it floods the kernel log sometimes when the mouse goes over the virtual machine’s console window. So ditch it by adding /etc/modprobe.d/evbug-blacklist.conf having this single line:
blacklist evbug
- Edit /etc/fstab. Remove everything, and leave only this row:
/dev/vda3 / ext4 defaults 0 1
- Remove persistence udev rules, if such exist, at /etc/udev/rules.d. Oddly enough, there was nothing in this directory, not in the existing OpenVZ server and not in a regular Ubuntu 24.04 server installation.
- Boot up the system from disk, and perform post-boot fixes as mentioned below.
Post-boot fixes
- Verify that /tmp is indeed mounted as a tmpfs.
- Disable (actually, mask) the automount service, which is useless and fails. This makes systemd’s status degraded, which is practically harmless, but confusing.
# systemctl mask proc-sys-fs-binfmt_misc.automount
- Install the dbus service:
# apt install dbus
Not only is it the right thing to do on a Linux system, but it also silences this warning:
Cannot add dependency job for unit dbus.socket, ignoring: Unit dbus.socket failed to load: No such file or directory.
- Enable login prompt on the default visible console (tty1) so that a prompt appears after all the boot messages:
# systemctl enable getty@tty1.service
The other tty’s got a login prompt when using Ctrl-Alt-Fn, but not the visible console. So this fixed it. Otherwise, one can be mislead into thinking that the boot process is stuck.
- Optionally: Disable vzfifo service and remove /.vzfifo.
Just before the IP address swap
- Reboot the openVZ server to make sure that it wakes up OK.
- Change the openVZ server’s firewall, so works with a different IP address. Otherwise, it becomes unreachable after the IP swap.
- Boot the target KVM machine in rescue mode. No need to set up the ssh server as all will be done through VNC.
- On the KVM machine, mount new system as /mnt/new:
# mkdir /mnt/new # mount /dev/vda3 /mnt/new
- On the OpenVZ server, check for recently changed directories and files:
# find / -xdev -ctime -7 -type d | sort > recently-changed-dirs.txt # find / -xdev -ctime -7 -type f | sort > recently-changed-files.txt
- Verify that the changes are only in the places that are going to be updated. If not, consider if and how to update these other files.
- Verify that the mail queue is empty, or let sendmail empty it if possible. Not a good idea to have something firing off as soon as sendmail resumes:
# mailq
- Disable all services except sshd on the OpenVZ server:
# systemctl disable cron dovecot apache2 bind9 sendmail mysql xinetd
- Run “mailq” again to verify that the mail queue is empty (unless there was a reason to leave a message there in the previous check).
- Reboot OpenVZ server and verify that none of these is running. This is the point at which this machine is dismissed as a server, and the downtime clock begins ticking.
- Verify that this server doesn’t listen to any ports except ssh, as an indication that all services are down:
# netstat -n -a | less
- Repeat the check of recently changed files.
- On KVM machine, remove /var and /home.
-
# rm -rf /mnt/new/var /mnt/new/home
- Copy these parts:
On the KVM machine, using the VNC console, go# nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv
and on myserver:
# tar --one-file-system -cz /var /home | nc -q 0 185.250.251.160 1234 > /dev/null
Took 28 minutes.
- Check that /mnt/new/tmp and /mnt/tmp/run are empty and remove whatever is found, if there’s something there. There’s no reason for anything to be there, and it would be weird if there was, given the way the filesystem was copied from the original machine. But if there are any files, it’s just confusing, as /tmp and /run are tmpfs on the running machine, so any files there will be invisible anyhow.
- Reboot the KVM machine with a reboot command. It will stop anyhow for removing the CDROM.
- Remove the KVM’s CDROM and continue the reboot normally.
- Login to the KVM machine with ssh.
- Check that all is OK: systemctl status as well as journalctl. Note that the apache, mysql and dovecot should be running now.
- Power down both virtual machines.
- Request an IP address swap. Let them do whatever they want with the IPv6 addresses, as they are ignored anyhow.
After IP address swap
- Start the KVM server normally, and login normally through ssh.
- Try to browse into the web sites: The web server should already be working properly (even though the DNS is off, but there’s a backup DNS).
- Check journalctl and systemctl status.
- Resume the original firewall rules and verify that the firewall works properly:
# systemctl restart netfilter-persistent # iptables -vn -L
- Start all services, and check status and journalctl again:
# systemctl start cron dovecot apache2 bind9 sendmail mysql xinetd
- If all is fine, enable these services:
# systemctl enable cron dovecot apache2 bind9 sendmail mysql xinetd
- Reboot (with reboot command), and check that all is fine.
- In particular, send DNS queries directly to the server with dig, and also send an email to a foreign address (e.g. gmail). My web host blocked outgoing connections to port 25 on the new server, for example.
- Delete ifcfg-venet0 and ifcfg-venet0:0 in /etc/sysconfig/network-scripts/, as they relate to the venet0 interface that exists only in the container machine. It’s just misleading to have it there.
- Compare /etc/rc* and /etc/systemd with the situation before the transition in the git repo, to verify that everything is like it should be.
- Check the server with nmap (run this from another machine):
$ nmap -v -A server $ sudo nmap -v -sU server
And then the DNS didn’t work
I knew very well why I left plenty of time free for after the IP swap. Something will always go wrong after a maneuver like this, and this time was no different. And for some odd reason, it was the bind9 DNS that played two different kinds of pranks.
I noted immediately that the server didn’t answer to DNS queries. As it turned out, there were two apparently independent reasons for it.
The first was that when I re-enabled the bind9 service (after disabling it for the sake of moving), systemctl went for the SYSV scripts instead of its own. So I got:
# systemctl enable bind9 Synchronizing state for bind9.service with sysvinit using update-rc.d... Executing /usr/sbin/update-rc.d bind9 defaults insserv: warning: current start runlevel(s) (empty) of script `bind9' overrides LSB defaults (2 3 4 5). insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `bind9' overrides LSB defaults (0 1 6). Executing /usr/sbin/update-rc.d bind9 enable
This could have been harmless and gone unnoticed, had it not been that I’ve added a “-4″ flag to bind9′s command, or else it wouldn’t work. So by running the SYSV scripts, my change in /etc/systemd/system/bind9.service wasn’t in effect.
Solution: Delete all files related to bind9 in /etc/init.d/ and /etc/rc*.d/. Quite aggressive, but did the job.
Having that fixed, it still didn’t work. The problem now was that eth0 was configured through DHCP after the bind9 had begun running. As a result, the DNS didn’t listen to eth0.
I slapped myself for thinking about adding a “sleep” command before launching bind9, and went for the right way to do this. Namely:
$ cat /etc/systemd/system/bind9.service [Unit] Description=BIND Domain Name Server Documentation=man:named(8) After=network-online.target systemd-networkd-wait-online.service Wants=network-online.target systemd-networkd-wait-online.service [Service] ExecStart=/usr/sbin/named -4 -f -u bind ExecReload=/usr/sbin/rndc reload ExecStop=/usr/sbin/rndc stop [Install] WantedBy=multi-user.target
The systemd-networkd-wait-online.service is not there by coincidence. Without it, bind9 was launched before eth0 had received an address. With this, systemd consistently waited for the DHCP to finish, and then launched bind9. As it turned out, this also delayed the start of apache2 and sendmail.
If anything, network-online.target is most likely redundant.
And with this fix, the crucial row appeared in the log:
named[379]: listening on IPv4 interface eth0, 193.29.56.92#53
Another solution could have been to assign an address to eth0 statically. For some odd reason, I prefer to let DHCP do this, even though the firewall will block all traffic anyhow if the IP address changes.
Using Live Ubuntu as rescue mode
Set Ubuntu 24.04 server amd64 as the CDROM image.
After the machine has booted, send a Ctrl-Alt-F2 to switch to the second console. Don’t go on with the installation wizard, as it will of course wipe the server.
In order to establish an ssh connection:
- Choose a password for the default user (ubuntu-server).
$ passwd
If you insist on a weak password, remember that you can do that only as root.
- Use ssh to log in:
$ ssh ubuntu-server@185.250.251.160
Root login is forbidden (by default), so don’t even try.
Note that even though sshd apparently listens only to IPv6 ports, it’s actually accepting IPv4 connection by virtue of IPv4-mapped IPv6 addresses:
# lsof -n -P -i tcp 2>/dev/null COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME systemd 1 root 143u IPv6 5323 0t0 TCP *:22 (LISTEN) systemd-r 911 systemd-resolve 15u IPv4 1766 0t0 TCP 127.0.0.53:53 (LISTEN) systemd-r 911 systemd-resolve 17u IPv4 1768 0t0 TCP 127.0.0.54:53 (LISTEN) sshd 1687 root 3u IPv6 5323 0t0 TCP *:22 (LISTEN) sshd 1847 root 4u IPv6 11147 0t0 TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED) sshd 1902 ubuntu-server 4u IPv6 11147 0t0 TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)One can get the impression that sshd listens only to IPv6. But somehow, it also accepts
So don’t get confused by e.g. netstat and other similar utilities.
To NTP or not?
I wasn’t sure if I should run an NTP client inside a KVM virtual machine. So these are the notes I took.
- This is a nice tutorial to start with.
- It’s probably a good idea to run an NTP client on the client. It would have been better to utilize the PTP protocol, and get the host’s clock directly. But this is really an overkill. The drawback with these daemons is that if the client goes down and back up again, it will start with the old time, and then jump.
- It’s also a good idea to use kvm_clock in addition to NTP. This kernel feature uses the pvclock protocol to lets guest virtual machines read the host physical machine’s wall clock time as well as its TSC. See this post for a nice tutorial about kvm_clock.
- In order to know which clock source the kernel uses, look in /sys/devices/system/clocksource/clocksource0/current_clocksource. Quite expectedly, it was kvm-clock (available sources were kvm-clock, tsc and acpi_pm).
- It so turned out that systemd-timesyncd started running without my intervention when moving from a container to KVM.
On a working KVM machine, timesyncd tells about its presence in the log:
Jul 11 20:52:52 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.003s/+0ppm Jul 11 21:27:00 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.001s/+0ppm Jul 11 22:01:08 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.002s/0.007s/0.001s/+0ppm Jul 11 22:35:17 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.001s/+0ppm Jul 11 23:09:25 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.007s/0.007s/0.003s/+0ppm Jul 11 23:43:33 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.003s/0.007s/0.005s/+0ppm (ignored) Jul 12 00:17:41 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.006s/0.007s/0.005s/-1ppm Jul 12 00:51:50 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.005s/+0ppm Jul 12 01:25:58 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm Jul 12 02:00:06 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm Jul 12 02:34:14 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.005s/+0ppm Jul 12 03:08:23 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.005s/+0ppm Jul 12 03:42:31 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.004s/+0ppm Jul 12 04:17:11 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.003s/+0ppm
So a resync takes place every 2048 seconds (34 minutes and 8 seconds), like a clockwork. As apparent from the values, there’s no dispute about the time between Debian’s NTP server and the web host’s hypervisor.