Virtualization: Notes to self

This is just things I wrote down while playing with QEMU/KVM virtualization, for my own purposes of packing two existing computers into a third one. This is just some random jots. I’ve written a similar post in 2024 (14 years after this one).

Log files

There are definitely two files one wants to peek at every now and then

  • /var/log/libvirt/qemu/{guest-name}.log (in particular when a USB device doesn’t attach)
  • /var/log/audit/audit.log (SELinux audit, possibly piped to grep -E ‘^type=(AVC|SELINUX\_ERR)’ to reduce amount of junk)

Start with running virt-manager as a non-root user (it will fail with a nondescriptive message if trying to run it as root). A root password will have to be fed.

Use qcow2 disk images

If you’re going to play around with the image, and then maybe want to get rid if the changes, this is sooo simple:

#qemu-img create -F raw -b clean-hda.img -f qcow2 hda.qcow2

Note that qemu-img can also create and apply snapshots of images, which is also good.

Don’t try to use Virtual Machine manager’s clone utility for qcow2 images, since the tool will rewrite all data.

USB device passtrough

It looks like libvirt doesn’t massage the permissions for USB character devices, before adopting them, so both classic permissions and SELinux blow when trying to run passthrough. The current workaround would be to change the USB device’s classic permission manually, and run in permissive mode. Which is indeed unhealthy, but harmless given the fact that XP gives a healthy blue screen in response to the new device. I think I saw XP complaining something about USB 2.0 versus USB 1.1.

Running the same scenario with a knoppix LiveCD image, I managed to find the device (a Canon 500D camera) with lsusb. The root USB hub was shown to be a UHCI. I’m not sure whether a real PTP camera would respond to a UHCI hub, which maybe explains why Windows got confused finding such a camera under the emulated hub.

The emulator appears as /usr/bin/qemu-kvm under the SELinux domain (type) svirt_t, which is decleared in /usr/share/selinux/devel/include/services/virt.if (which you don’t want to read).

USB devices appear somewhere under /dev/bus/usb/ as character devices with SELinux type usb_device_t.

Command line

Create a guest, run it and pause it. Then dump its XML by something like (as root):

# virsh dumpxml try1 > ~eli/virt/try1.xml

Then play around with the XML file a bit, destroy the previous domain, and

# virsh create ~eli/virt/try2.xml

How can I manually manipulate the list of guests?

virsh # start demo
error: Failed to start domain demo
error: internal error unable to start guest: qemu: could not open disk image /home/eli/virt/tryxp.img: Permission denied

Reason(?): tryxp.img is of the wrong context, so SELinux prevents it to be opened…? But I ran SELinux in permissive mode. How could this happen? Or, as put in the Fedora 12 Virtualization Guide:

SELinux prevents guest images from loading if SELinux is enabled and the images are not in the correct directory. SELinux requires that all guest images are stored in /var/lib/libvirt/images.

Basically, what solved this was to move the image to the dedicated directory, and going:

# virt-install --force --name demo3 --ram 1024 --import --disk path=/var/lib/libvirt/images/tryxp.img

or even better:

# virt-install –force –name demo5 –ram 1024 –import –disk path=/var/lib/libvirt/images/hda.img

For playing around:

# virsh

“Stealing” command lines from Virtual Machine Manager

After running a machine under the GUI interface, it’s possible to do so for running on an external VNC console. Just find the command with ps aux | grep qemu-kvm. The following changes in flags apply:

  • Remove the -S flag. It says that the guest should not start until commanded to do so.
  • Remove the -montitor flag. We’re running on VNC only
  • Change the -vnc flag’s address to point to an address known to the outside world, if necessary
  • Add “-usbdevice tablet” so that the mouse is followed correctly
  • Change the -net flags (two of them) to “-net nic -net user”. This is said to have a performance hit, but it’s simple and it works with an internal (fake) DHCP server

The tip of the day

In the relevant terminology, “source” refers to information seen by the host OS, while “target” to the guest OS.

Another little tip: If I try to install Windows 7, and the installation gets stuck for very long periods of time with nothing really happening, maybe it’s because the disk image is read-only? :-O

VMPlayer

Running VMplayer on a 2.6.35 kernel requires a small fix, which was published here. The thing is that some kernel symbol has changed its name, and hence the vmmon module fails to compile. How I love when people are sensitive about backward compatibility.

To make a long story short, one needs to go to where the VMplayer module sources are, and go:

$ perl -pi -e 's,_range,,' iommu.c

which is bit of a Columbus egg, I would say. Also, VMplayer repeatedly wanted to compile the modules every time I started it, because it missed vsock (which wasn’t compiled in the first place), so I followed the same page‘s hint and edited /etc/vmware/config to say

VSOCK_CONFED = "no"

By the way, I tried to figure out what this module does, and all google results tell you how to tweak and patch. Don’t people care what they do on their computers? Maybe this component is useful?

The following remark was true when this post was published, but no more:

To run VMPlayer under Fedora 12, there’s a need for a little hack, or the VMPlayer closes right after starting:

# cd /usr/lib/vmware/resources/
# mv mozilla-root-certs.crt old-mozilla-root-certs.crt

Have no idea why this is.

VMPlayer networking

The interesting stuff is at /etc/vmware/networking, which pretty much says which interface is NATed and which is host-only. To pick a certain device for bridging, additional lines configuring add_bridge_mapping should be added as explained on this page.

Also useful to play with

# vmware-networks --status
# vmware-networks --start
# vmware-networks --stop

etc. (as root)

Moving an old Linux computer

The mission: Move an old 2.4.21-kernel based computer into a VMPlayer virtual machine. The strategy was to run a LiveCD on the virtual machine, create an empty ext3 file system on it, copy the files and boot. Caveats follow.

The most important lesson learned, is that everything on the new machine has to be done with a kernel with the same or earlier version compared with the one that will be used. In simple words, the rescue disk must run a matching kernel. In particular, with a newer rescue disk, the ext3 is generated with an inode size of 256, which old kernels don’t support. Even worse, even if the file system was generated properly, newer kernels (say, 2.6.35) writes things on the disk that will confuse the old kernel. This leads to “attempt to access beyond end of device” errors during boot, from 01:00 (the ramdisk) as well as 08:01 (the root filesystem’s partition).

So fdisk, mkfs.ext3 and mkinitrd must be done with a matching kernel running. And copying the files too, of course. The rescue disk must match, as mentioned above.

The next thing to note is that all hda’s turn into sda’s. That needs to be adjusted in /etc/fstab as well as /etc/lilo.conf.

The most difficult thing to handle was the fact that SCSI drivers were not installed in the kernel by default, so the initrd image had to be adjusted. So after the file system is copied, mount it, chroot to it, and go

# mkinitrd --with=scsi_mod --with=sd_mod --with=BusLogic /boot/initrd-2.4.21-mykern.vmware.img 2.4.21-mykern

The insmods are attempted in the same order the flags appear, and the two latter modules depend on the scsi_mod. So it’s important to keep the order of the flags as above.

If these modules aren’t loaded, the generation of /dev/sda and /dev/sda1 doesn’t occur, resulting in a kernel panic with various complains (pivot_mount fails, init not found and some more).

And then fix /etc/lilo.conf and /etc/fstab and run lilo. It’s recommended to copy the /boot directory first, so that the kernel image falls within the lower 4 GB. Or pick lba32 option, as lilo will tell you.

And then boot. Mount the ISO image for VMware tools (/dev/hdc) and run ./vmware-install (going with the defaults most of time).

Converting a VMPlayer machine to VirtualBox

Make a copy of the entire directory to a new place. No point messing up things. Then create an OVF file, without converting the disks (because it takes forever, and possibly fails)

$ ovftool --noDisks CleanXP.vmx export.ovf

For some reason, ovftool doesn’t fill in the correct names of the disks (I asked not to convert them, not to ignore them). The simplest way around this is to remove the disk definitions altogether, and import them manually from Virtualbox. For a VM with three disks, the lines marked in red should be removed:

 <References>
 <File ovf:href="export-disk1.vmdk" ovf:id="file1" ovf:size="0"/>
 <File ovf:href="export-disk2.vmdk" ovf:id="file2" ovf:size="0"/>
 <File ovf:href="export-disk3.vmdk" ovf:id="file3" ovf:size="0"/>
 </References>
 <DiskSection>
 <Info>Virtual disk information</Info>
 <Disk ovf:capacity="10" ovf:capacityAllocationUnits="byte * 2^30" ovf:diskId="vmdisk1" ovf:fileRef="file1" ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" ovf:populatedSize="0"/>
 <Disk ovf:capacity="30" ovf:capacityAllocationUnits="byte * 2^30" ovf:diskId="vmdisk2" ovf:fileRef="file2" ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" ovf:populatedSize="0"/>
 <Disk ovf:capacity="40" ovf:capacityAllocationUnits="byte * 2^30" ovf:diskId="vmdisk3" ovf:fileRef="file3" ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" ovf:populatedSize="0"/>
 </DiskSection>

and also the other references to these vmdiskn:

<Item>
 <rasd:AddressOnParent>0</rasd:AddressOnParent>
 <rasd:ElementName>disk0</rasd:ElementName>
 <rasd:HostResource>ovf:/disk/vmdisk1</rasd:HostResource>
 <rasd:InstanceID>7</rasd:InstanceID>
 <rasd:Parent>6</rasd:Parent>
 <rasd:ResourceType>17</rasd:ResourceType>
 </Item>
 <Item>
 <rasd:AddressOnParent>1</rasd:AddressOnParent>
 <rasd:ElementName>disk1</rasd:ElementName>
 <rasd:HostResource>ovf:/disk/vmdisk2</rasd:HostResource>
 <rasd:InstanceID>8</rasd:InstanceID>
 <rasd:Parent>6</rasd:Parent>
 <rasd:ResourceType>17</rasd:ResourceType>
 </Item>
<Item>
 <rasd:AddressOnParent>0</rasd:AddressOnParent>
 <rasd:ElementName>disk2</rasd:ElementName>
 <rasd:HostResource>ovf:/disk/vmdisk3</rasd:HostResource>
 <rasd:InstanceID>10</rasd:InstanceID>
 <rasd:Parent>4</rasd:Parent>
 <rasd:ResourceType>17</rasd:ResourceType>
 </Item>

Delete the export.mf file. It contains export.ovf’s hash signature, and it’s wrong after editing the file.

Then in the VM VirtualBox Manager, pick File > Import Appliance… and choose export.ovf.

Then add the hard disks by choosing Storage > (Diskette Icon) > Add Hard Disk and pick Choose existing disk. Pick the .vmdk file with no number attached to it (not the e.g. *-s011.vmdk).

Enlarge the video memory to 18 MB (or more), or Virtualbox complains that it’s too little. Enable audio.

That was nice so far. Only problem is that Windows required re-activation because too much hardware changed, and the internet activation failed, probably because the NIC wasn’t detected. To install the driver, I’ll need to activate first, and so I was stuck, lost patience and left the whole thing for now.

Setting up an encrypted LVM over RAID 5

What I wanted

All I wanted was a software RAID-5 on three disks with a whole disk encryption on Fedora 12. For some reason, I thought the installation script would do that for me.

The relevant part in the installation procedure was kind enough to allow me to set it up in the GUI, but when I went for the installation, I got a window saying “An error was encountered while setting up device sdc1″. sdc1, by the way, is just a plain unencrypted partition. But who cares. I insisted on looking at the “details” where it said “device has not been created”.

Hurray! Now I get it all. Not.

A quick tour in the command line console (I wonder why I always end up doing things with my bare hands) revealed that the partition tables were intact. Simply put, nothing was done.

The setup

The catch about software RAID is that its drivers have to be loaded from somewhere, so obviously a non-RAID boot partition is needed for that. My decision was to allocate ~250 MB on all three disks, exactly the same number of cylinders, and put the boot on one of them. I don’t know why, but it feels right to me that the disks will access the same geometrical points when running as RAID, even thought I’m not forced to do so.

The rest of the partition (around 1000 GB) is allocated as one big software RAID partition. With three disks like this forming a RAID-5 I’ll get one big (fake)  ~2TB disk, which will be encrypted completely.  On top of that, I’ll put one big LVM physical volume, on which I’ll have 4 GB swap and then a root partition. The precise sizes don’t matter anymore, since I’m under LVM.

Setting up the RAID

Since the LVM tools are not active in Fedora’s rescue mode, I went for booting Ubuntu 9.10 as a LiveCD. The catch is that it doesn’t support neither LVM nor mdadm, so both had to be installed (after setting up a network connection, of course):

# apt-get install lvm2
# apt-get install mdadm

(the latter forcing me to configure postfix. Yuck!)

On /dev/sda: For the boot partition I allocated cylinders 1 to 30. For Raid Autodetect (type 0xfd) I took all the rest. Then I brutally raw copied the first 128 sectors to /dev/sdb and /dev/sdc. That was a bad idea, since the the partition table contains the disk’s GUID. So I cleaned up both disks with some zeros, and ran fdisk on each.

Following  this I created a software RAID:

mdadm --create /dev/md0 --level=raid5 --raid-devices=3 --chunk=128 /dev/sda2 /dev/sdb2 /dev/sdc2

And the hard disks started to work. /dev/md0 was up and running pretty much immediately. To monitor the progress:

# mdadm --detail /dev/md0

(yey!)

The whole disk encrypted

# cryptsetup -v luksFormat /dev/md0

After saying “YES” to kill all data and entering my secret passphrase, cryptsetup said all was successful and a window popped up saying that “gvfs-gdu-volume-monitor closed unexpectedly”. How I love when everything is so automated and I don’t need to worry about anything technical.

But who cares? I opened my new secret candy box:

# cryptsetup luksOpen /dev/md0 candybox

and found /dev/mapper/candybox in place (yey II)

Setting up LVM

It’s worth mentioning that there’s an interactive shell-like environment for manipulating LVM volumes. Just go

# lvm

Regardless,  following the same HOWTO (more or less) I went

root@ubuntu:/home/ubuntu# pvcreate /dev/mapper/candybox
  Physical volume "/dev/mapper/candybox" successfully created
root@ubuntu:/home/ubuntu# vgcreate vg_raid -s 32M /dev/mapper/candybox
  Volume group "vg_raid" successfully created

Noted the “-s 32M”? That sets the physical extent size to 32MB instead of the default 4MB. Since the maximal number of extents for a volume is 65534 (more or less…?), and the whole disk is around 2TB, that’s the smallest number which does the work (32 MB x 65534 ~ 2 TB).

OK, now let’s put the swap and boot in place:

root@ubuntu:/home/ubuntu# lvcreate --size 4G vg_raid -n lv_swap
  Logical volume "lv_swap" created
root@ubuntu:/home/ubuntu# lvcreate --size 10G vg_raid -n lv_root
  Logical volume "lv_root" created
root@ubuntu:/home/ubuntu# ls /dev/mapper/
candybox  control  vg_raid-lv_root  vg_raid-lv_swap

Installing…

To my delight, (and somewhat surprise) the Fedora 12 installation machinery detected both software RAID and all that was underneath, prompted me for my passphrase, and allowed my to allocate the mounting points on the existing logical volumes. Which is the sensible thing to do, but I couldn’t believe it actually happened!

All in all, the installation went smooth, so did the bootup, and everything seems to be OK (fingers crossed).

When bad gets worse

So what happens if a disk suddenly decides to commit suicide? The answer is nothing special. Due to the redundancy, the system will keep on working as usual. Even worse, nobody will be notified (except for an email to root from mdadm). The system just runs on. In a way, that’s good and pretty bad at the same time.

Here’s a typical mail, which is sent to root:

From root@localhost.localdomain  Sat Jan 16 17:51:27 2010
Return-Path: <root@localhost.localdomain>
Date: Sat, 16 Jan 2010 17:51:27 +0200
From: mdadm monitoring <root@localhost.localdomain>
To: root@localhost.localdomain
Subject: DegradedArray event on /dev/md0:ocho.localdomain
Status: RO

This is an automatically generated mail message from mdadm
running on localhost.localdomain

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda2[0] sdc2[3] sdb2[1]
      1953037824 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_]
      [==================>..]  recovery = 94.0% (918267264/976518912) finish=920.7min speed=1053K/sec

unused devices: <none>

It looks like there’s no dedicated software for sounding the alarm. The solution seems to be a simple cronjob script, which runs mdadm every hour or so, and checks if all is OK. The word “degraded” in the “detail” report looks like a good indicator that something isn’t like it should be. My script is at the bottom of this page.

I tried plugging out the spare disk’s SATA cable while the computer was up and running (which is, by all means a violent thing to do). Nothing happened. A few lines in /var/log/messages telling a short story about a disk which doesn’t respond, and the RAID going down to two disks. The log of the boot afterwards (with two disks) is not more dramatic about it. RAID-5, only two disks detected, too bad, let’s go on. The disk is declared “removed” in the “detail” report, and that’s it.

So I turned the computer off, replugged the disk, and turned it on again. The system showed no particular interest in it. To get it back to the RAID array, I did

# mdadm /dev/md0 --add /dev/sdc2

This kicked off the rebuild off this disk. Thinking about it, it’s pretty clever that nothing happens without human intervention. But I’ll consider having the smartd service running.

When worse turns into a catastrophe

Since I was about to wipe my disks soon anyhow, I figured to take the test to the extreme. After all, there’s no point in having a spare disk if it doesn’t work, does it?

So I let the spare disk recover up to 25% (so I know that the relevant disk are is indeed OK, but not letting it finish). Then I pulled disk #2′s  SATA plug. So now we have disk #1 which is OK, disk #2 missing, and disk #3 spare but not completely recovered. Don’t try this on real data.

The system lost its stability this time, but it’s not like that connector was intended for hot removal. The attempt to reboot failed with “no root device found”. This is no wonder. I couldn’t really expect the RAID array to rely on one disk and one spare which never got the time to recover, could I? Well, I tried.

So I went for Ubuntu again. Keep in mind that former /dev/sdc2 is now /dev/sdb2. The general music is “everything is clean, but forget it”:

root@ubuntu:/home/ubuntu# mdadm --assemble --scan
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.
root@ubuntu:/home/ubuntu/mnt# mdadm --run /dev/md0
mdadm: failed to run array /dev/md0: Input/output error
root@ubuntu:/home/ubuntu# cat /proc/mdstat
Personalities :
md0 : inactive sda2[0](S) sdb2[3](S)
      1953037952 blocks
root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : fb16d869:ffd27a50:e368bf24:bd0fce41 (local to host ubuntu)
  Creation Time : Fri Jan 15 11:11:40 2010
     Raid Level : raid5
  Used Dev Size : 976518912 (931.28 GiB 999.96 GB)
     Array Size : 1953037824 (1862.56 GiB 1999.91 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Jan 15 14:34:58 2010
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 15648d3e - correct
         Events : 2500

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     3       8       34        3      spare

   0     0       8        2        0      active sync   /dev/sda2
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       34        3      spare

But I won’t let this turn me off. There’s a guy who had this kind of problem for real, and was kind enough to document his findings. The bottom line was to tell mdadm to create the RAID array from the start, only assume that everything is already there with “–assume-clean”. Extremely dangerous. I would rawcopy all data to a new hard disk and try it there, if this was for real. But it wasn’t. So I went:

root@ubuntu:/home/ubuntu/mnt# mdadm --create /dev/md0 --assume-clean --level=5 --verbose --chunk=128 --raid-devices=3 /dev/sda2 missing /dev/sdb2
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda2 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Fri Jan 15 11:11:40 2010
mdadm: /dev/sdb2 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Fri Jan 15 11:11:40 2010
mdadm: size set to 976518912K
Continue creating array? y
mdadm: array /dev/md0 started.
root@ubuntu:/home/ubuntu/mnt# cryptsetup luksOpen /dev/md0 candybox
Enter LUKS passphrase:
key slot 0 unlocked.
Command successful.

But I didn’t get the LVM devices kicked off. So I went:

root@ubuntu:/dev/mapper# dmsetup resume /dev/mapper/candybox
root@ubuntu:/dev/mapper# ls
candybox  control  vg_raid-lv_root  vg_raid-lv_swap

And of course, in real life I would fsck the disk and such. But the bottom line is clear: If the data is there, it’s there.

Summary

Let’s hope I won’t ever need this stuff. Let’s hope that all three disks will live forever. But it’s comforting to know, that if one of those suddenly dies, there is a good chance the whole story will end with the purchase of some hardware. Nothing else.

Sort-of appendix

When the RAID doesn’t come up by itself

If the RAID array is known to be fine, but doesn’t come up:

# mdadm --assemble --scan

which worked under Ubuntu, since it was nice enough to create an /etc/mdadm/mdadm.conf file. Otherwise we need to be more explicit:

# mdadm --assemble /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2

Script for checking the RAID’s health

This is the script I’ve put as a cronjob. Note that it’s completely silent when all is OK, and starts to say things when they’re not. The thing is that the cron daemon sends an email message to whoever it sends (usually root) when there was an output from the cronjob. This doesn’t solve the problem with the email going to root, but it’s a good guard if the mails to root are forwarded to someone attentive.

I’ve also made a log in /var/log/raidlog. The purpose of this log is to allow me to verify that the script is indeed running every now and then. After all, the whole issue is that I don’t expect a hard disk failure tomorrow, but rather when I’ve forgotten about this script altogether. But I hope I’ll have the sense to peek at the log every now and then.

#!/bin/bash

device=/dev/md0
now=`date`

report="";

checkraid() {
    mdadm --detail $device | grep State | {
        read s;

        if ! echo $s | grep -i -q state ; then
            echo Problem: mdadm gave bad output for $device
            return 1;
        fi

        if echo $s | grep -i -q degraded ; then
            echo Problem: Device $device is degraded
            return 1;
        fi

        return 0;
    }
}

if ! checkraid ; then
    echo ""
    echo mdadm output follows:
    echo ""
    mdadm --detail $device

    echo "Bad RAID at $now" >> /var/log/raidlog
    exit 1;
fi
echo "RAID OK at $now" >> /var/log/raidlog
exit 0;

A useless adventure

There is a basic flaw in the above: The LVM is generated on /dev/md0. If we view /dev/md0 as a hard disk, it means it has no partition table!

So somewhere in the middle of the route above, I tried this: With fdisk, I set up an LVM partition on /dev/md0p1

So what I wanted to do was:

# pvcreate /dev/md0p1
# vgcreate lvm-raid -s 32M /dev/md0p1

The difference is that I went for /dev/md0p1 rather than /dev/md0, so that a descent partition table is in place.

But the first one failed with a “/dev/md0p1  not found (or ignored by filtering)” because there is some kernel issue. Or is it? Maybe it’s the whole world telling me I should stop being so fussy.

What I needed was a kernel of 2.6.24 and down, because whoever reported the kernel problem had things running on 2.6.24. I wanted to run an earlier Ubuntu (8.10 instead of 9.10), assuming that it was a kernel issue. I will never know, since that distro got stuck during boot.

So I went for a small rescue distro, namely SystemRescueCD version 1.0.0 (loading altker64, since the default kernel caused a kernel panic). And there I encountered a brand new problem: /dev/md0p1 never appeared in the /dev directory, even through the partition was there. Using mdadm to kick the RAID of did create /dev/md0, but not its subpartitions.

At this point I realized, that even if I manage to get it my way, odds are that not many others did it my way, meaning that nobody really tests things on my settings. In other words, things are expected to go wrong in the long run. Which I why I dropped this.

Running Mustek Powermust 600 UPS with nut

Introduction

I have an old Mustek 600VA UPS with RS-232 connection. Since I change the battery every couple of years on my own, I find it pretty pointless to throw it away.

And my brand new computer (running Fedora 12) has an RS-232 port if one insists to use a connector on the motherboard. Needless to say, I had to steal the RS-232 cable from motherboard to panel from an old computer.

A simple male/female extension RS-232 cable connects between UPS and computer. PC to UPS communication goes through pin 3, and the other direction through pin 2. The cable should be completely transparent (i.e. not switch wires, in particular not pins 2 and 3).

In case you want to see if the UPS is alive with Putty (or some other terminal), go for 2400 baud, 8N1 (8 bits, no flow control and one stop bit). Type Q1 followed by a carriage return, and the UPS should respond with a line of status info. The protocol is described  here.

Linux driver

The right driver to run with the UPS is the Network UPS Tool, or nut for short.

# yum install nut

Configure the UPS.

# vi /etc/ups/ups.conf

Basically, add the following entry:

[PowerMust]
driver = megatec
port=/dev/ttyS0
desc = "Mustek PowerMust 600VA"
mfr = "Mustek"
model = "PowerMust 600"

Now we can kick off the driver which listens to the UPS (the response takes a few seconds):

# upsdrvctl start
Network UPS Tools - UPS driver controller 2.4.1
Network UPS Tools - Megatec protocol driver 1.6 (2.4.1)
Megatec protocol UPS detected.

If you happen to have an oscilloscope on the RS-232 lines (ha!), you should see some action every few seconds from now on.

Or, alternatively, if you want to see what’s running on the computer, I suggest

# ps aux | grep nut

It so happens, that during the installation, we got a new user, named “nut” under which most of the relevant processes are running. A process running /sbin/megatec should be found there.

Configure the monitor

# vi /etc/ups/upsmon.conf

Basically, there’s only one line to add (pretty much at the beginning):

MONITOR PowerMust@localhost 1 upsmon pass master

I’ve chosen to configure the monitor as master, since the I plan to put some virtualization guests, which may work as slaves. As for the “upsmon” and “pass”, these are user and password when connecting to upsd. Since no user nor password were configured in upsd.users, the attempt to login will fail, resulting in error messages when the monitor is started.

Now let’s test it. First, let’s remove the already running daemons:

# service ups stop
Stopping UPS monitor:                                      [FAILED]
Stopping upsd:                                             [  OK  ]
Shutting down upsdrvctl:                                   [  OK  ]

Stopping the UPS monitor failed, because it wasn’t running. And not kick it off:

# service ups start
Starting UPS driver controller:                            [  OK  ]
Starting upsd:                                             [  OK  ]
Starting UPS monitor (master):                             [  OK  ]

# ps aux | grep nut
nut       6578  0.0  0.0   6084   468 ?        Ss   02:51   0:00 /sbin/megatec -a PowerMust
nut       6582  0.0  0.0  40552   624 ?        Ss   02:51   0:00 /usr/sbin/upsd
nut       6586  0.0  0.0  38364   856 ?        S    02:51   0:00 /usr/sbin/upsmon
root      7509  0.0  0.0 102728   780 pts/0    R+   02:54   0:00 grep nut

Now let’s watch the relevant entries in /var/log/messages:

Jan 14 02:51:48 short megatec[6578]: Startup successful
Jan 14 02:51:48 short upsd[6581]: listening on 127.0.0.1 port 3493
Jan 14 02:51:48 short upsd[6581]: listening on ::1 port 3493
Jan 14 02:51:48 short upsd[6581]: Connected to UPS [PowerMust]: megatec-PowerMust
Jan 14 02:51:48 short upsd[6582]: Startup successful
Jan 14 02:51:48 short upsmon[6585]: Startup successful
Jan 14 02:51:48 short upsd[6582]: User upsmon@::1 logged into UPS [PowerMust]
Jan 14 02:51:48 short upsmon[6586]: Master privileges unavailable on UPS [PowerMust@localhost]
Jan 14 02:51:48 short upsmon[6586]: Response: [ERR ACCESS-DENIED]

The two last lines are a result of the lack of user and password in upsd. But that’s fine. The UPS in monitored even so.

The final step is to activate the service on boot, using chkconfig or some GUI tool (System->Administration->Services on my computer).

Checking out the UPS

Want to grab some info about the UPS? That’s what upsc is for (run while UPS was on battery):

# upsc PowerMust
battery.charge: 60.0
battery.voltage: 12.10
battery.voltage.nominal: 12.0
driver.name: megatec
driver.parameter.mfr: Mustek
driver.parameter.model: PowerMust 600
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/ttyS0
driver.version: 2.4.1
driver.version.internal: 1.6
input.frequency: 50.0
input.frequency.nominal: 50.0
input.voltage: 0.0
input.voltage.fault: 0.0
input.voltage.maximum: 230.5
input.voltage.minimum: 230.0
input.voltage.nominal: 230.0
output.voltage: 230.0
ups.beeper.status: disabled
ups.delay.shutdown: 0
ups.delay.start: 2
ups.load: 14.0
ups.mfr: Mustek
ups.model: PowerMust 600
ups.serial: unknown
ups.status: OB
ups.temperature: 37.8
ups.type: standby

Various partition mounts

This is yet another bunch of things I wanted written down, in case I need them one day. No certain order here.

The root mount

Grub gives the following kernel parameter:

root=/dev/mapper/vg_short-lv_root

Meaning, that the kernel has the LVM module in place when starting off (I suppose it’s kicked off in the initrd stage)

Boot image

Open a boot image (note that this works like tar -x):

zcat /boot/initramfs-2.6.31.9-174.fc12.x86_64.img | cpio -i

Opening an encrypted partition

[root@short ~]# cryptsetup luksOpen /dev/mapper/vg_short-mysecret mysecret
Enter passphrase for /dev/mapper/vg_short-mysecret:
Key slot 0 unlocked.
[root@short ~]# mount /dev/mapper/mysecret /secret

Note that the second argument, mysecret is the name of the device generated under /dev/mapper. Also note that umounting /secret doesn’t close the partition. In addition to unmounting, there’s also need for another cryptsetup command:

[root@short ~]# umount /secret/
[root@short ~]# cryptsetup luksClose /dev/mapper/mysecret

ionice. Only that made upgrading worth it.

Copying gigabytes of disk can get the system sluggish. On Linux, the solution is so simple. If process 18898 happens to take control of your disk, just go:

ionice -c 3 -p 18898

And you have your computer back. “-c 3″ means class 3, which is idle class. In other words, take the disk when nobody else asks for it.

I love it. More here.

Installing .so libraries on a 64-bit Fedora with yum

A short note about installing libraries on an Intel 64 bit machine (Fedora 12 in my case).

It all starts with a short conversation like this one:

[root@short Downloads]# rpm -i VirtualBox-3.1-3.1.2_56127_fedora12-1.x86_64.rpm
error: Failed dependencies:
    libQtGui.so.4()(64bit) is needed by VirtualBox-3.1-3.1.2_56127_fedora12-1.x86_64
    libQtOpenGL.so.4()(64bit) is needed by VirtualBox-3.1-3.1.2_56127_fedora12-1.x86_64

THE WRONG THING TO DO IS:

[root@short Downloads]# yum install libQtGui.so.4

because it will install the right library, but for 32 bit (note the i686 suffix). There must be an elegant way to come around this. Until I find it out, I’ll go:

[root@short Downloads]# yum whatprovides libQtGui.so.4
Loaded plugins: presto, refresh-packagekit
1:qt-x11-4.5.3-7.fc12.i686 : Qt GUI-related libraries
Repo        : fedora
Matched from:
Other       : libQtGui.so.4

(this entry possibly duplicated for each repository)

Now we have the name of the 32-bit package, qt-x11-4.5.3-7.fc12.i686 in our case. Just replace i686 with x86_64, and off we go:

[root@short Downloads]# yum install qt-x11-4.5.3-9.fc12.x86_64

It is left as an exercise to explain why yum would load i686 files when running on a x86_64 machine. Tell me if you get the logic.

Another thing is that one can match against file names:

[root@short eli]# yum whatprovides "*libXm.so.*"
Loaded plugins: presto, refresh-packagekit
lesstif-0.95.2-1.fc12.i686 : OSF/Motif library clone
Repo        : fedora
Matched from:
Filename    : /usr/lib/libXm.so.2
Filename    : /usr/lib/libXm.so.2.0.1
Other       : libXm.so.2

Some blurbs about tweaking mplayer’s codecs

The (non-) problem

The truth is that there never was a problem. What really happened was that I got things confused between a few versions of mplayer/mencoder, and only the latests (of those I have, 1.0rc1-3.2.2) does the job. Bus since I wrote down some things I might want to return to some day, here’s the whole blob.

What got me on this, was that using mplayer (version 1.0 of some rc’s) to play my Canon 500D’s movie clips, I get the sound OK, but an image which flashes the real thing every now and then (keyframes?) and shows some grey garbage otherwise. And tons of error messages on the console.

On some other version the image looks OK, but A/V sync is lost soon enough. And many error messages indicate that the decoder doesn’t get it right (a lot of “Consumed only 136509 bytes instead of 1365120″ alikes). That isn’t very promising.

It’s worth to note, that mplayer/mencoder choose the native ffmpeg libavcodec by default. As ffmpeg improves, these issues are fixed.

My real goal is to convert the clip to something descent using mencoder. I don’t even think about editing a video in H.264. So all I need now is to find the right video decoder.

But I prefer to use the decoder supplied by Canon (spoiler: I never managed to). Since I own the camera, and got the software legally, why not use the codec they sent me? Only one problem…

What is the DLL of the codec used?

In order to “steal” the codec from the Canon application, I needed to know which DLL Canon uses to play its own videos. In order to do that, I opened Zoombrowser, and ran the ListDLL command line utility (which can be downloaded from here). The utility spits out all DLLs of all processes running, but using the “>” redirection in a command window, all data goes to a file. Then I double-clicked a video, and ran ListDLL again, redirecting the data to another file.

The difference between the files is most probably the DLLs loaded to play a clip. This worked because I ran Zoombrowser from scratch.

With my favourite diff application, I got a long list of new DLLs. These two caught my eyes:

C:\Program Files\Canon\Canon MOV Decoder\CanonH264Filter.ax
C:\Program Files\Canon\Canon MOV Decoder\CanonIPPH264DecLib.dll

Hmmm…  In retrospective, I could have figured that one out without heavy tools. But at least I know which they are now.

Installing the codec

I copied both files mentioned above to /usr/local/lib/win32. Then I added the following entry to the /usr/local/etc/mplayer/codecs.conf:

videocodec canonh264
  info "Canon's H.264 decoder"
  status working
  fourcc avc1,AVC1
  fourcc h264,H264
  fourcc x264,X264
  driver dshow
  dll "CanonH264Filter.ax"
  guid 0xb7215ee3, 0xaf54, 0x433f, 0x9d, 0x2f, 0x22, 0x64, 0x91, 0x69, 0x84, 0xf6
  out YUY2
  out BGR32,BGR24,BGR15

As for the output formats, I guessed them. Odds are I got it wrong.  As for the GUID, I managed to find a class with a “FriendlyName” saying “Canon H.264 Decode Filter 1.3″, and it has the class ID B7215EE3-AF54-433F-9D2F-2264916984F6. So basically that’s it.

Anyhow, this didn’t work at all. When I ran mencoder with -vc canonh264, it ended with

Forced video codec: canonh264
Opening video decoder: [dshow] DirectShow video codecs
Called unk_GetVersionExW
Segmentation fault

I won’t even try to pretend that I understand what went wrong here, but GetVersionExW happens to be a function exported from Windows’ kernel32.dll, retrieving information about the current operating system. I’m not clear on whether the function was never found, or if the decoder wasn’t happy with the answer it got. This way or another, a segfault is a segfault. I thought this was the place to give up. I’ll use the good old ffmpeg decoder.

A remark about H.264

Canon’s choice of H.264 as the encoding format is a bit bizarre, since it’s a version of MPEG-4. And just for general knowledge: MPEG-4 is horrible. In particular it has this annoying thing about stale regions in the frame, which look bad and basically don’t heal. But I suppose that MPEG-2 would require too fast writes to the flash or something. The result is still pretty bad.

Summary

Trying to fix video issues late at night is not necessarily the wisest thing to do.

Xilinx’ MiG memory controller’s init process reverse engineered

Introduction

I’m using Xilinx’ MiG 1.7.3 for running DDR2 memories on a Virtex-4 FPGA. It didn’t take me long to realize that the controller never finishes initialization. The problem is that I had no idea of why, and as far as I know, no documentation to refer to in my attempts to understand where the controller got stuck, which is an essential stage in getting it unstuck.

Since Xilinx are wise enough to release the IP core with its source, I was able to reverse engineer the initialization process to the level necessary for my own purpose. This is a memo of the details, just in case I’ll need to do this again some time. I sure hope that won’t be necessary…

In my case, the problem seems to have been overheating of the FPGA. I’m not 100% sure about this, but with 90 degrees centigrade measured on the case, and everything starting to work OK when a descent heatsink (with fan) was put in place, it looks pretty much like good old heat.

Overview

The initialization process consists of several stages. During the entire process, the controller is governed by the init_state one-hot state machine in the ddr2_controller module. The end of this process is marked by init_done_int going high, which goes out as init_done, hence marking the end to the IP core’s user.

The initialization consists of roughly three stages:

  • Setting up the memory device
  • Setting up the IDELAYs taps so that the DQ inputs are samples with good timing.
  • Learning the correct latency for reading data from DQs during read cycles.

Throughout the init process, active and precharge operations take place as required by standard. These operations are not mentioned here, since they don’t add anything to understanding the principle.

Setting up the memory device

This is the normal JEDEC procedure, which includes a preknown sequence of peculiar operations, as defined in the DDR2 standard. This includes writing to the memory’s mode registers. During this phase, the controller will not care if it’s talking to a memory or not, since it never reads anything back from the memory.

Setting up the IDELAYs taps

The importance of this stage is to make sure that data is sampled from DQ lines at the best possible timing. Each DQ input is calibrated separately.

This stage begins with a single write command to column zero. The write data FIFO has already been written some data to it, so that the rising edge contains all ones, and the falling edge is all zeros. For example, for a memory with 16 DQ lines, the FIFO has been fed with 0xFFFF0000 twice for memories with burst length of 4, and four times if the burst length is 8.

This can be seen in the backend_fifos module. In that module, one can see that data is written to the write data FIFO immediately after reset. Also, there is another set of words written to the FIFO, which are intended for the next stage.

All in all, this single write commands drains the FIFO with the words containing all ones or all zeros, so that column zero contains this data. Next the controller reads column zero continuously while adjusting the delay taps to achieve proper input timing for the DQs.

The logic for moving the taps is outside the ddr2_controller module. The latter merely helps by performing reads. When the tap logic finishes, it signals it’s done by raising the signal known as phy_Dly_Slct_Done in the ddr2_controller module, and carries many other names such as SEL_DONE.  In the tap_logic module (from which it origins) it’s called tap_sel_done.

The tap calibrator increments the tap delay until the data on that line shifts, or until 55 increments has taken place. Whenever this happens, it’s considered to be the data edge. The tap delay is then decremented by the number of times defined by the tby4tapvalue parameter (17 in my case).

Note that even if no edge is found at all, the tap delay calibrator will consider the calibration of that tap OK.

Here is a short list of lines I found useful to look at with a scope (using the FPGA Editor):

  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/calib_done_int
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/tap_sel_done
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_done
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyce_dqs[0]
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyinc_dqs[0]

CHAN_DONE is the most interesting signal, because it goes high briefly every time a data line has finished its tap calibration. Unfortunately, the synthesizer messes up the identification of this signal, so the only way to tell it, is by finding what causes ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int to change state. In my case it was

ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int_not0002

This signal should go high 8 times (or the number of data lines per DQ you have). If it does a fewer numbers and then nothing happens, you can tell which of this data lines is problematic simply by counting these strobes.

Latency for reading data

The purpose of this stage is to tell when, in terms of semiclocks, to sample the data read from the memory. I’m not 100% clear on why this stage is necessary at all, but that won’t change the fact that it exists.

This stage starts with a single write operation again. This time the written data is slightly more sophisticated (keep in mind that it was loaded to the write data FIFO immediately after wakeup from reset). The first column will have the data 0xA written to it, duplicated to occupy all DQs. For example, on a memory with 16 DQs, the first column will be 0xAAAA. The second column is 0x5 duplicated, the third 0x9, and the fourth 0x6, all duplicated. If the burst length is 8, this four word sequence is repeated.

After writing this, the controller reads column zero continously, until COMP_DONE goes high. This signal origins from the pattern_compare8 module, which tells the controller it has recovered the correct input data alignment. More precisely, the rd_data send the ddr2_controller a logical AND of all pattern_compare8′s comp_done signals.

These pattern_compare8 modules simply looks for an 0xAA pattern followed by a 0x99 pattern in the input during rising edges only, or an 0x55 followed by 0x66 on the rising edge. So it will catch the reads of the first and third column, or the second or forth, but either way this solves the alignment ambiguity completely.

As the pattern_compare8 module tries to match the data, it increments (among others) its clk_cnt_rise register (not to be confused with the clk_count_rise wire, which contains the final result). Monitoring clk_cnt_rise[0] (using FPGA Editor, for example) can give a positive feedback that the initialization is at this phase. It should give a nice square wave at half the DDR2 controller’s clk0 frequency, and then stop when this phase is done.

Summary.

The initialization process is not the simplest in the world, and it’s likely to fail if you got anything wrong with your memory, in particular if you have as little as one data wire line miswired. This is not really good news, but understanding the process may help at least understand what went wrong, and hopefully fixing it too.

DCM loses lock on Virtex-4: It’s all about auto calibration

The whole story began when I decided to be kind enough to tell the Xilinx tools (ISE 9.2 in my case) that the Virtex-4 I’m targeting is a grown-up. Stepping 2, to be precise. I added

CONFIG STEPPING = "2";

to the UCF file. It must have been one of those moments where I believed that the tools do what is best for me.

It wasn’t long before the mapper told me it’s rewarding me with some autocalibration logic for the DCM. Sounded pretty OK. Some logic that will get the DCM back on its feet if the clock stops and returns. Not that I have any such plans. As a matter of fact, I’ve made sure that the DCM will get a reset after any possible messing with the DCM’s clock input.

Both the mapping warning and the docs mention that it’s possible to disable the autocalibration feature in order to save some logic. They never mentioned that the logic can kill the DCM.

And then one of the DCMs started losing lock. I had changed several other things at the same time, so it wasn’t easy to track down why. But it looked so weird: The DCM’s lock flag would go high, and then go down again. The timescale was tens of milliseconds, which is way beyond the response times  for a DCM.

My first thought was that it must have something to do with the clock’s signal quality. Maybe some crosstalk. The clock was around 200 MHz. But then I decided to look a bit closer on what this autocalibration was about.

That led me to Answer Record #21435, which was pretty explicit about the reset:

When the input clock returns, the user must manually assert the DCM reset for at least 200 ms to resume proper DCM functionality.

200 ms? So there is was. I did mess with the input clock, but then I sent a brief reset signal to the DCM to get it back to normal. It worked in the past. Not with the extra logic. So all I needed to do, was to add

defparam   thedcm.DCM_AUTOCALIBRATION = “FALSE”;

(in the Verilog definition of the DCM) and the problem (which shouldn’t have occured in the first place) was solved.

To make things slightly more annoying, I also had to upgrade the old “DCM” primitives to “DCM_BASE”, because when the “DCM” primitives are upgraded automatically to DCM_ADV’s (by XST),  the DCM_AUTOCALIBRATION parameter set to the default, which is “TRUE”. The same parameter simply doesn’t exist for the backward-compatible “DCM” primitive.

Note to self: Remember to disable the autocalibration on all DCMs from now on.

500 days: The uptime never reached

I know it’s stupid, but there’s something cool about very long uptimes. I think it begins when the uptime reaches 100 days: You think twice before rebooting your Linux machine: Is it really necessary?

The truth is that without really paying attention to it, my Linux box approached 500 days. I noticed that, because I wanted to upgrade my kernel (it’s about time…). But first I wanted to know what I’m breaking. 493 days, uptime told me. So I decided to wait a week. Childish, but it’s 500 days after all.

Today is the day. My computer has been up for 500 days. Here comes the celebration:

[eli@localhost eli]$ uptime
  1:51pm  up 3 days, 10:01,  4 users,  load average: 0.21, 0.05, 0.01

Oops. That’s not 500 days! And I’m sure that the computer hasn’t rebooted. Conclusion: My uptime counter has wrapped to zero.

Let’s get to the root of this:

[eli@localhost eli]$ cat /proc/uptime
295330.71 41278547.60

The left number is the uptime, in seconds.  To the right is the idle time, in seconds as well.

It just so happens that the kernel counts the uptime in jiffies, which is 1/100 of a second. Since my computer is a 32-bit Intel, the counter can only count 2^32 jiffies, which is 42949672.96 seconds, which happens to be 497.1 days (divide by 60, 60 again, and then 24). After that, the counter starts over from zero. What a nasty trick…

And since we’re at it, why not check up the idle time? 41278547.60 seconds is 477.76 days, which tells something about how much I use my computer. In fact, it’s more like a housekeeping server (mail, routing and stuff), so no wonder it’s idle most of the time. A quick calculation shows that it was active no more than 4.5% of its 500 days of uptime. Hmmm…

Anyhow, I’m sure that whoever wrote the uptime counter had a little laught about idiots like me who think it’s cool to see the uptime reach 500 days. Don’t count the jiffies, he must have said, make the jiffies count.