Root over NFS: Diskless boot from network
Introduction
Since I wanted to develop a kernel module on Linux for a specific piece of hardware, I thought it would be a nice idea to do that on a computer I wouldn’t mind crashing every now and then. Wanting physical access to the target hardware, a bare motherboard with nothing else but the absolute minimum seemed like a nice direction. The idea was this motherboard wouldn’t have any disk, hence no fsck’s every time I make a mistake in kernel mode.
Using cobbler for this was an overkill, but this is how I did it back in 2011.
So I went for booting with PXE, and mount the root over NFS. I also chose to do so using the kernel’s native support, rather than doing this with an initrd image. I should mention, that root over NFS has the drawback of not having a disk cache. This means that every time a file is read, there’s some network activity, as opposed to files read from a local disk, which are actually read from RAM if the file has been accessed recently. The difference is evident when running bash scripts and make builds (compiling a kernel, for example) because each invocation of any executable involves a lot of access to more or less the same set of supporting files (libraries, locale configurations, C header files, you name it). This massive file access is not noticed when the disk cache is used, and neither is it evident when running a single application. But on the diskless machine, this is what I got compiling a recent kernel:
# time { make bzImage && make modules ; } << ... lots of output ... >> real 95m22.497s user 51m46.023s sys 6m7.257s
Hmmm… We have some 55% of the time for actual crunching. Not as horrible as I expected, and still not so impressive.
Maybe running a virtualized machine on a disk image over the NFS mount would solve this, since the guest would maintain a cache of its own. I would also think about running on an network block device, which would most likely be an elegant solution. I did neither, because my main objective was tolerance to kernel crashes and not necessarily running heavy scripts.
The machine with a real disk was a Fedora Core 12.
Powering up a bare motherboard
Just a few things to keep in mind:
- An ATX power supply will not supply power (and neither will its fan turn) unless connected to a motherboard. A couple of pins on the 20/24 pin connector are used for the purpose of the motherboard telling the power supply when to go on and off.
- A motherboard won’t power on just like that. When external power is applied to it, it expects the power button to be pressed. Locate the header connector on your motherboard, and find the two pins that are normally connected to the power button (the motherboard’s manual will help). Use a jumper or just a piece of metal to short-circuit these two with the power supply connected, and power should go on.
- It’s possible to change the BIOS setting so that power goes on immediately on external power application. Somewhere in the BIOS’ menus under “Power Management”
- Once the fans start rolling, the BIOS should show something on the on-board VGA output, or either the CPU or the memory is not plugged in OK. Or check the 12V power connector.
- Also, there may be some trouble to get the on-board gigabit NIC to talk nicely with an Edimax ethernet switch. Running a SMART LAN test and enabling the NIC’s boot ROM (both in the BIOS menues) solved this. I don’t know which of these two did the job.
The boot process
These are the steps which I eventually got:
- Diskless computer waking up, making a DHCP broadcast request on the network. The host computer assigns it with an IP address (possibly detecting the NIC’s MAC address and giving it a constant address), and also tells it that pxelinux.0 is the file to load to if PXE is desired
- Diskless loads pxelinux.0 through TFTP. The tftp daemon’s root is /var/lib/tftpboot
- pxelinux.0 then attempts to find a configuration file, ending up loading /pxelinux.cfg/default, and then /menu.c32 through TFTP.
- If all is well, a GRUB-like menu appears.
- Pxelinux then loads the appropriate kernel into memory and starts it (I didn’t run with initrd) with given parameters.
- The kernel makes its own DHCP broadcast, gets its IP address, and connects to the NFS server, according to kernel parameters.
- The NFS mount is used as read-only root.
- Boot proceeds normally like any Linux system (executing /sbin/init or alike)
Getting started: cobbler
(Once again, my year 2019 wisdom is that cobbler is an overkill)
In order to get a quick start, cobbler is a nice tool to get the right files in the right places. In the case of setting up a diskless computer, cobbler is useful up to the stage in which a boot is successful. After that, I closed its daemon and made sure the service is off in all runlevels in order to ensure that it won’t mess with the configuration files.
First of all:
# yum install cobbler # yum install dhcp
(and agree to a zillion other packages for dependencies, tftp-server in particular)
It’s written in Python (yuck!) so when things go wrong, one gets these ugly exception reports.
And I’ll say this now: You may want to do this every now and then to check up how much disk space cobbler is eating (you may be surprised):
# du -sh /var/www/cobbler/
Edited /etc/cobbler/dhcp.template (and NOT /etc/dhcp/dhcpd.conf) to set up the DHCP service properly: Set ‘subnet’ to match my internal network, ‘routers’ to the default gateway to WAN, ‘domain-name-servers’ to the DNSes available, ‘range dynamic-bootp’ to the range of addresses free for lease.
And also an entry of this form, to give a constant address (which is out of the conventional range, so it doesn’t get stolen).
host mylaptop { hardware ethernet 08:00:2b:4c:59:23; fixed-address 192.168.1.222; }
Enable (with chkconfig or GUI alike) and start (service start or GUI alike) the services cobblerd, dhcpd and httpd. And then
# cobbler check
Which had a lot of remarks. Since I run SELinux in permissive mode, I ignored all SELinux related remarks. iptables is doing nothing right now, so I didn’t do anything special with the firewall either. I couldn’t care less about debmirror or the password for sample templates.
I did change the server and next_server entry in /etc/cobbler/settings to the computer’s IP as seen through the relevant ethernet card.
Also, I changed manage_dhcp to 1 in the same file, since I didn’t have DHCP running previously, and I’m somewhat lazy.
Finally, I went
# cobbler get-loaders
otherwise cobbler yells at me when trying to sync.
And then
# cobbler sync
which is where I got most error messages (and wrote their fixes above as if I had a clue).
When I got an OK from this command, I saw DHCP up and running, and also the first stage of PXE boot going into my laptop. But there was nothing to boot yet, so I copied the laptop’s running kernel and initrd to the boot server, and configured it as follows:
# cobbler distro add --name=justatest --kernel=/path/to/vmlinuz-2.6.27 --initrd=/path/to/initrd-2.6.27.img # cobbler profile add --name=testingPXE --distro=justatest # cobbler sync
At this point, PXELINUX was clearly loaded and reported IP addresses correctly, but hung after the line “Trying to load: pxelinux.cfg/default”. Checking with wireshark revealed that the specific file had indeed loaded, and so had menu.c32, but after that nothing happened. A menu should have appeared, as it did when I did the same thing on more recent hardware. Which means, as usual, that maintainers are so happy about playing with their software that they forget that some people are actually supposed to use it, including those not running top-gun hardware.
Anyhow, editing /var/lib/tftpboot/pxelinux.cfg/default so that PROMPT is 1 and not 0, I got a “boot:” prompt, on which I could enter the word “testingPXE”, and that started a successful boot of the laptop.
Useful command when things don’t really work (except for checking up /var/log/messages:
# cobbler report
Since the target motherboard is a piece of recent hardware, I could conclude this phase as successful. At this point, cobbler should be disabled by stopping the cobblerd service and making sure with chkconfig it will never wake up again. It’s better to edit the target files directly from now on.
Make sure that tftp and dhcpd services are enabled in the relevant runlevels.
Preparing the NFS share
The next step was to install Linux on the to-be remote disk, so that the computer will boot from network and behave as if it ran on a local disk. Only the data is somewhere else. I went for CentOS-5.5-i386. I really wanted to try Slackware, but their bittorrent ISO image took way too much time to download.
First I set up an NFS share with an empty directory, so that the installation can go somewhere. I did it with Fedora’s GUI interface (shame on me, I’m sure I’ll pay for this somehow) so it was just System > Administration > Server Settings > NFS. I added a share, pointed at the directory, set hosts to “*”, allowed Read/Write, and most important checked “Treat remote root as local root”. After all, I want to install a system, so root squashing is out of the question. This is not the safest setting in the world, and is a clear opening for implanting root kits on your system. The only thing left to protect the computer at this stage is a firewall.
my /etc/exports now read:
/storage/diskless/cent55_root *(rw,sync,no_root_squash)
which I reduced to (after copying files from virtual machine, as described below)
/storage/diskless/cent55_root 10.0.0.0/255.255.255.0(rw,sync,no_root_squash)
so only computers on my little private LAN will get access. I have my limits too. After restarting the NFS service (was is necessary?) I mounted the share easily.
My original intention was to run the installation on the diskless machine right away, but I found out the hard way that installation software behaves exactly the way Microsoft thinks it should: Either you run it the way it was intended, or forget about it. After a while I realized, that the simplest way was to create a virtual machine, run the full installation on that one, and after all is finished copy the files into my to-be NFS directory. Not very elegant, not quick at all, but at least I knew this would work.
After the installation was done, we’re left with copying the files. I tried doing so by booting the machine in rescue mode, mount the disk and also mount the NFS share, and copying everything. But since I didn’t set up any paravirtualization on either disk or NIC, it turned out horribly slow. Instead, I did a dirty mount-by-probing on the disk image, and mounted the first and only partition with
# i=512; while ! mount -o loop,offset=$i centos.img mnt ; do i=$((i+512)); echo Now trying $i ; done
Which is such a dirty trick it should be outlawed, in particular because it’s unnecessary. After spitting out a lot of junk, it stopped after writing “Now trying 32256″ which happens to be (virtual) sector #63 (first sector is zero). And then I changed directory to mnt, and just went
# { tar -c --one-file-system --to-stdout --preserve * ; } | { cd /path/to/new/root && tar --preserve -v -x ; }
which took a few minutes or so.
Year 2019 update: Recent Linuxes might not support NFSv2 out of the box, which might prevent mounting by the kernel. Try:
# cat /proc/fs/nfsd/versions
-2 +3 +4 +4.1 +4.2
If you got that “-2″, NFSv2 isn’t supported. Edit /etc/default/nfs-kernel-server, possibly following this page.
Preparing the client for running root over NFS
It turns out, that the idea is pretty unpopular. It’s one thing to boot a system from network to install something on the local disk, but judging from the availability of documentation and ready-to-go utilities, running on a remove disk seems to be an exotic idea.
For example, the tools for generating an root-over-NFS are in general archlinux-dependent, and are based upon working with an initrd to get things in place. Moreover, booting non-initrd from NFS requires certain kernel options, that weren’t even module-enabled in a few distribution kernels I checked.
I made the decision of not having an initrd in the bootup process, simply because it doesn’t make sense. The kernel has the ability to do the DHCP discovery and mount the root from NFS, so I can’t see why messing up with maintaining an initrd image. The lack of initrd is why cobbler isn’t a candidate anymore: It’s not ready to import a kernel without an initrd.
So the first phase is to compile a kernel with root on NFS. Following {kernel source directory}/Documentation/filesystems/nfs/nfsroot.txt and a nice HOWTO, I enabled (as in Y, not M) the following flags on my kernel configuration: CONFIG_NFS_FS, CONFIG_ROOT_NFS, CONFIG_NET_ETHERNET, CONFIG_IP_PNP, CONFIG_IP_PNP_RARP, CONFIG_IP_PNP_BOOTP, CONFIG_IP_PNP_DHCP.
I also made sure my specific on-board Ethernet card was also in the kernel itself, and not as a module. My motherboard won’t boot from an external NIC, but it’s actually possible to load the kernel from one NIC and let the kernel use an external NIC, if that makes life easier (even though it’s weird having two network cables going to a bare motherboard. Oh well).
Then I compiled the kernel as usual (some of us do that from time to time). For the heck if it, I copied bzImage and friends to /boot, but the real thing was to copy vmlinuz to /var/lib/tftpboot/images/ and edit /var/lib/tftpboot/pxelinux.cfg/default so it has an entry saying:
LABEL nfs-centos55 kernel /images/vmlinuz-2.6.18-NFS1 MENU LABEL nfs-centos55 append ip=::::::dhcp text rootfstype=nfs root=/dev/nfs nfsroot=10.0.0.1:/diskless/cent55_root ipappend 2
Note that there is no initrd image mentioned here at all!
I’d also mentioned that sometimes there’s a problem mounting NFS in the standard way (using UDP), probably due to problems with packet fragmentation. The symptom is that the DHCP goes fine, but a message saying “server X not responding, still trying” appears and the boot process is stuck. The immediate solution is to run NFS over TCP, which is done simply by adding the “tcp” mount option, e.g. “nfsroot=10.0.0.1:/diskless/cent55_root,tcp”
Finally, under the to-be-NFS’ed directory, I changed /etc/fstab, so that the first line reads
10.0.0.1:/diskless/cent55_root / nfs defaults 0 0
Or the boot fails on attempting to remount the root directory as read-write.
I booted my little motherboard, and apparently nothing happened. But looking at /var/log/messages under NFS’ed directory it was clear that the boot process had kicked off, but went horribly wrong. Why? Because I skipped the initram phase. And a few things happen there.
Making up for not using initram
So I had a look on the initram going along with the distribution with a
# zcat cent55_root/boot/initrd-2.6.18-194.el5.img | cpio -i
and looked at the file named init in the initrd’s root directory. It turns out, that the /dev subdirectory is mounted as a tmpfs at the initram stage, with the necessary regular elements mknod’ed every time. I have to say there is something clever about this, since it keeps the /dev clean. On the other hand, it’s likely to confuse traditional UNIXers and it makes the whole system depend on the initram stage, which are both pretty yuck.
The simple solution is to create these files for real in the /dev directory. So I copied that init file into cent55_root, and edited it to look like this:
mkdir /dev/pts mkdir /dev/shm mkdir /dev/mapper mknod /dev/null c 1 3 mknod /dev/zero c 1 5 mknod /dev/urandom c 1 9 mknod /dev/systty c 4 0 mknod /dev/tty c 5 0 mknod /dev/console c 5 1 mknod /dev/ptmx c 5 2 mknod /dev/rtc c 10 135 mknod /dev/tty0 c 4 0 mknod /dev/tty1 c 4 1 mknod /dev/tty2 c 4 2 mknod /dev/tty3 c 4 3 mknod /dev/tty4 c 4 4 mknod /dev/tty5 c 4 5 mknod /dev/tty6 c 4 6 mknod /dev/tty7 c 4 7 mknod /dev/tty8 c 4 8 mknod /dev/tty9 c 4 9 mknod /dev/tty10 c 4 10 mknod /dev/tty11 c 4 11 mknod /dev/tty12 c 4 12 mknod /dev/ttyS0 c 4 64 mknod /dev/ttyS1 c 4 65 mknod /dev/ttyS2 c 4 66 mknod /dev/ttyS3 c 4 67 chmod 0755 /dev/{null,zero,urandom,systty,tty,console,ptmx} /dev/tty*
And then chrooted myself in the cent55_root directory:
# chroot .
after which I ran the little snippet above. It’s important to check that the /dev directory doesn’t contain any regular files by any chance. Mine had urandom and null, which resulted in mknod failing, and in turn I got errors saying /dev/null is on a read-only file system.
And now the machine booted pretty OK. I disabled the microcode_ctl and yum_updatesd, the first one because it held the boot stuck for a minute or so, and the second because I find it annoying.
And that’s it! The motherboard now booted like a clockwork!
Kind-of Appendix
My failed attempt to install directly on the diskless machine
Oh, I was so naive thinking this would be quick and easy. Having the ISO images at hand, I went
# mount -o loop CentOS-5.5-i386-bin-DVD.iso mnt # cobbler import --path=/home/eli/mnt --name=centos55 --arch=i386 # umount mnt # cobbler sync
Note to self: Use absolute paths! Going –path=mnt made rsync go to /mnt, and copy all remote mounts I had, which is a nice backup of my system, but no thanks. Ah, and it didn’t end there! When I discovered my mistake and CTRL-C’ed the process, and saw how the cobbler subdirectory keeps on growing even so. Using pstree I found out that cobblerd (cobbler’s daemon, right?) continues to rsync data into my false repository in the background. I had previously wondered why a daemon was necessary, and I also wondered when that moment of annoyance of using a do-it-all tool would come, and there I had both answers at once. And one can’t just turn off cobblerd, because everything is around that daemon. Yuck!
And even if I had got this right, I don’t think I’ll get the Nobel prize for this, in particular since this made cobbler call rsync requesting it to basically duplicate the ISO image’s content. It also scanned the files in all kind of ways to support kickstart and virtual stuff, which I have no interest in. And there went some extra 4 GB.
Reinstalling
Since cobbler started backing up the neighbouring computers (as a result of my mistake, I have to admit) and filled up my root partition, I had to take some violent measures (that is, some rm -rf) which soon ended up with cobblerd refusing to start at all (producing yucky and meaningless Python error messages, what else). The only solution was to save my /etc/cobbler /dhcp.template and /etc/cobbler/settings. Then yum remove cobbler, and remove directories /etc/cobbler, /var/www/cobbler and /var/lib/cobbler. Go yum install cobbler again, and copy the two saved files back. And cobbler get-loaders.
That’s the only thing these know-it-all tools understand: violence.
Reader Comments
Were you able to resolve the issue of the diskless machine failing if nfs gets interrupted? I have that problem now, running kali on a raspberry pi from a disk image on my main server. Whenever I have to reboot the main server, I have to do a hard reset on my kali box because it stops.
I found the solution to my problem. The Kali image I am using wasn’t pre-loaded with nfs support so when it lost the connection it couldn’t re-establish it. Once I installed nfs-common it worked nicely.