The problem
Sometimes software packages require setting some environment variables for its proper execution. When these variables clearly have no effect on any other applications in the system, that’s fine. When they want to manipulate some sensitive variables, which other applications may depend on, that’s a whole different story.
When it’s a single executable, the problem is fairly simple. When it’s gazillions of them, all requiring the same set of environment variables, it’s not so fun.
I solved this by writing one single wrapper for all executables, and a lot of symbolic links. This wrapper sets the environment variables for the relevant application, and then runs the desired executable. The path is set to run the wrapper, rather than the executable, so this is completely transparent. In this way, the new software sees the correct environment variables but without polluting them for the entire system.
Don’t play with my library path
I’ve just installed Xilinx ISE 9.2 on my Fedora 12 Linux machine. One of things I was required to do, was to add this snippet (more or less) to my .bashrc file:
if [ -n "$LD_LIBRARY_PATH" ]
then
LD_LIBRARY_PATH=${XILINX}/bin/${PLATFORM}:${XILINX}/X11R6/bin/lin64:/usr/X11R
6/lib:${LD_LIBRARY_PATH}
else
LD_LIBRARY_PATH=${XILINX}/bin/${PLATFORM}:${XILINX}/X11R6/bin/lin64:/usr/X11R6/lib
fi
That means that every Linux application from now on should look in Xilinx’ libraries before attempting to go for the ones Fedora supplies. But why first? Because Xilinx seems to override some standard libraries. Which is good for its own application, but can be pretty disastrous for all others. It means, for example, that removing or upgrading ISE may cause other things in your system break.
Why Xilinx chose this approach, I can only guess. It was most likely somewhere between “we can’t get it to work otherwise” and “you’re not using the computer for anything else, are you?”
My solution was to move these problematic lines to a wrapper script for each executable. If Xilinx wants these libraries for its own executables, so be it. Don’t pollute the whole system.
Setting up the path
Xilinx wanted me to add a lot of mumbo-jumbo into the .bashrc. Most went to the wrapper script (shown below). The only thing I put in .bashrc was appending a directory to the path. Xilinx wanted me to put ${XILINX}/bin/${PLATFORM}, but I went for ${XILINX}/bin-wrappers/${PLATFORM}
So this was added to .bashrc:
if [ -n "$PATH" ]
then
PATH=${XILINX}/bin-wrappers/${PLATFORM}:${PATH}
else
PATH=${XILINX}/bin-wrappers/${PLATFORM}
fi
export PATH
The wrapper
Now I created a the ${XILINX}/bin-wrappers directory, a lin64 directory underneath. In lin64, the a file named xilinx-app-wrapper is executable and looks like this:
#!/bin/bash
# First setup variables
PLATFORM=lin64
if [ -n "$LD_LIBRARY_PATH" ]
then
LD_LIBRARY_PATH=${XILINX}/bin/${PLATFORM}:${XILINX}/X11R6/bin/lin64:/usr/X11R6/lib:${LD_LIBRARY_PATH}
else
LD_LIBRARY_PATH=${XILINX}/bin/${PLATFORM}:${XILINX}/X11R6/bin/lin64:/usr/X11R6/lib
fi
export LD_LIBRARY_PATH
if [ -n "$NPX_PLUGIN_PATH" ]
then
NPX_PLUGIN_PATH=${XILINX}/java/${PLATFORM}/jre/plugin/i386/ns4:${NPX_PLUGIN_PATH}
else
NPX_PLUGIN_PATH=${XILINX}/java/${PLATFORM}/jre/plugin/i386/ns4
fi
export NPX_PLUGIN_PATH
if [ -n "$LMC_HOME" ]
then
LMC_HOME=${XILINX}/smartmodel/${PLATFORM}/installed_${PLATFORM}:${LMC_HOME}
else
LMC_HOME=${XILINX}/smartmodel/${PLATFORM}/installed_${PLATFORM}
fi
export LMC_HOME
# Now call the real executable. Putting the double quotes around $@
# tells bash not to break arguments with white spaces, so this is completely
# transparent.
exec ${XILINX}/bin/lin64/${0##*/} "$@"
It’s pretty simple until we reach the bottom line: The script is copied from Xilinx’ own example file, as they requested to be put in .bashrc. So before getting to the bottom, it’s just plain environment setting.
Now to the last line: I chose to run the Xilinx application with the bash-builtin exec function. This makes Xilinx’ application replace the bash script, so we have one process running (and one process to kill if necessary) and the return value issue handled neatly.
This exec transparently runs the Xilinx application, which depends on the command used to call the wrapper. Details:
We have this ${XILINX}/bin/lin64/${0##*/} expression. The ${0##*/} thing means $0 with anything coming before a slash, including the slash chopped off. Since $0 contains the application’s name as it was called, ${0##*/} is the application name without the path. So the path is set absolutely, and the application’s name is taken from $0. Now we have the exact path to the corresponding Xilinx application.
So this wrapper is a single script which can wrap all Xilinx executables. All we have to do is to symlink to the wrapper with the same names as the Xilinx applications.
Finally, we have the “$@” thing. That means all arguments with which the wrapper was called. Without the double quotes, possible spaces in the arguments would break them up.
Note that this works with any array, so
a[0]="Hello there";
a[1]="One argument";
exec ./test "${a[@]}"
will send the ./test script only two arguments (double quotes not passed to application)
Symbolic links
The idea is now to create a symbolic link for each executable in the bin directory, all pointing at xilinx-app-wrapper. This makes sense, since this script detects by which command it was called, and will exec the correct Xilinx application in turn.
The only problem is that Xilinx’ bin directory also includes several library files, which shouldn’t be executable. To overcome this I wrote a small script, which I used to create the symbolic links (and removed it afterwards):
#!/bin/bash
TARGETDIR=${XILINX}/bin-wrappers/lin64
for i in * ;
do if file $i | grep -iq executable ; then
( cd $TARGETDIR && ln -s xilinx-app-wrapper $i ; )
fi ;
done
I ran the script from ${XILINX}/bin/lin64, and its principle is simple: The “file” application is called on each file in that directory. If the word “executable” appears in the definition, it earns a symbolic link in the bin-wrappers directory (to the script, of course, and not to the Xilinx application).
So the result of this is:
$ cd ${XILINX}/bin-wrappers/lin64
$ ls -l
[...skipped a lot of lines...]
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xilgrep -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xilhelp -> xilinx-app-wrapper*
-rwxr-xr-x. 1 root root 925 2010-01-21 00:04 xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xilinxd -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xilperl -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xinfo -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 _xinfo -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xinfoenv -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xlicmgr -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xplorer.pl -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xplorer.tcl -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xpower -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xpwr -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xreport -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 XSLTProcess -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xst -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xtclsh -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xusbdfwu -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xusb_emb -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xusb_xlp -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xusb_xpr -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 xusb_xup -> xilinx-app-wrapper*
lrwxrwxrwx. 1 root root 18 2010-01-20 23:53 zip -> xilinx-app-wrapper*
Note that among all symlinks, we have xilinx-app-wrapper itself, which is the only thing that actually runs in this directory, and hence the only thing which needs changing when a Xilinx-global change in the environment is necessary.
That’s it. At this point everything ticked like a clockwork.
Barely relevant stuff
Since I didn’t reach the solution above right away, I tried a few other things. It’s a shame to throw them away just like that.
First, let’s see the “test” script mentioned above:
#!/bin/bash
while [ -n "$1" ]
do
echo Argument: $1
shift
done
It’s a simple script which shows which arguments were given to it by scanning them one by one. If an argument was broken because of spaces, here’s how I saw it.
And now an alternative (and much more cumbersome) way pass arguments transparently:
args="";
while [ -n "$1" ]; do
# Append the new argument within quotes, where possible existing
# double-quotes are converted to \"
args+="\"${1/\"/\\\"}\""
shift
[ -n "$1" ] && args+=" ";
done
bash <<END
./test $args
END
The trick about this script is to create an empty variable $args, and append each incoming argument surrounded by double quotes and a white space. If I wanted to make this simple, I would go
args+="\"$1\" "
somewhere in the middle, but hey, I don’t want a white space after the last argument. Besides, what happens if the argument itself contains a double quote? Solution: Replace each double quote (“) with an escaped one (\”). That’s what the terrible expression in the curly brackets stand for. It’s basically ${variable/search-pattern/replace-with} plus the fact that both double quotes and backslashes have to be escaped with a backslash. Looks a bit like Perl on a bad day.
And except for begin horrible, it has another major disadvantage: It creates another process. I couldn’t make an exec call by this method. So I feed a bash shell with the data through standard input, which isn’t very cute. But if one argument is END, it still works, by the way. So if each argument needs some manipulation, and can’t be passed through with “$@”, the latter method will do the job.
This is just things I wrote down while playing with QEMU/KVM virtualization, for my own purposes of packing two existing computers into a third one. This is just some random jots. I’ve written a similar post in 2024 (14 years after this one).
Log files
There are definitely two files one wants to peek at every now and then
- /var/log/libvirt/qemu/{guest-name}.log (in particular when a USB device doesn’t attach)
- /var/log/audit/audit.log (SELinux audit, possibly piped to grep -E ‘^type=(AVC|SELINUX\_ERR)’ to reduce amount of junk)
Start with running virt-manager as a non-root user (it will fail with a nondescriptive message if trying to run it as root). A root password will have to be fed.
Use qcow2 disk images
If you’re going to play around with the image, and then maybe want to get rid if the changes, this is sooo simple:
#qemu-img create -F raw -b clean-hda.img -f qcow2 hda.qcow2
Note that qemu-img can also create and apply snapshots of images, which is also good.
Don’t try to use Virtual Machine manager’s clone utility for qcow2 images, since the tool will rewrite all data.
USB device passtrough
It looks like libvirt doesn’t massage the permissions for USB character devices, before adopting them, so both classic permissions and SELinux blow when trying to run passthrough. The current workaround would be to change the USB device’s classic permission manually, and run in permissive mode. Which is indeed unhealthy, but harmless given the fact that XP gives a healthy blue screen in response to the new device. I think I saw XP complaining something about USB 2.0 versus USB 1.1.
Running the same scenario with a knoppix LiveCD image, I managed to find the device (a Canon 500D camera) with lsusb. The root USB hub was shown to be a UHCI. I’m not sure whether a real PTP camera would respond to a UHCI hub, which maybe explains why Windows got confused finding such a camera under the emulated hub.
The emulator appears as /usr/bin/qemu-kvm under the SELinux domain (type) svirt_t, which is decleared in /usr/share/selinux/devel/include/services/virt.if (which you don’t want to read).
USB devices appear somewhere under /dev/bus/usb/ as character devices with SELinux type usb_device_t.
Command line
Create a guest, run it and pause it. Then dump its XML by something like (as root):
# virsh dumpxml try1 > ~eli/virt/try1.xml
Then play around with the XML file a bit, destroy the previous domain, and
# virsh create ~eli/virt/try2.xml
How can I manually manipulate the list of guests?
virsh # start demo
error: Failed to start domain demo
error: internal error unable to start guest: qemu: could not open disk image /home/eli/virt/tryxp.img: Permission denied
Reason(?): tryxp.img is of the wrong context, so SELinux prevents it to be opened…? But I ran SELinux in permissive mode. How could this happen? Or, as put in the Fedora 12 Virtualization Guide:
SELinux prevents guest images from loading if SELinux is enabled and the images are not in the correct directory. SELinux requires that all guest images are stored in /var/lib/libvirt/images
.
Basically, what solved this was to move the image to the dedicated directory, and going:
# virt-install --force --name demo3 --ram 1024 --import --disk path=/var/lib/libvirt/images/tryxp.img
or even better:
# virt-install –force –name demo5 –ram 1024 –import –disk path=/var/lib/libvirt/images/hda.img
For playing around:
# virsh
“Stealing” command lines from Virtual Machine Manager
After running a machine under the GUI interface, it’s possible to do so for running on an external VNC console. Just find the command with ps aux | grep qemu-kvm. The following changes in flags apply:
- Remove the -S flag. It says that the guest should not start until commanded to do so.
- Remove the -montitor flag. We’re running on VNC only
- Change the -vnc flag’s address to point to an address known to the outside world, if necessary
- Add “-usbdevice tablet” so that the mouse is followed correctly
- Change the -net flags (two of them) to “-net nic -net user”. This is said to have a performance hit, but it’s simple and it works with an internal (fake) DHCP server
The tip of the day
In the relevant terminology, “source” refers to information seen by the host OS, while “target” to the guest OS.
Another little tip: If I try to install Windows 7, and the installation gets stuck for very long periods of time with nothing really happening, maybe it’s because the disk image is read-only? :-O
VMPlayer
Running VMplayer on a 2.6.35 kernel requires a small fix, which was published here. The thing is that some kernel symbol has changed its name, and hence the vmmon module fails to compile. How I love when people are sensitive about backward compatibility.
To make a long story short, one needs to go to where the VMplayer module sources are, and go:
$ perl -pi -e 's,_range,,' iommu.c
which is bit of a Columbus egg, I would say. Also, VMplayer repeatedly wanted to compile the modules every time I started it, because it missed vsock (which wasn’t compiled in the first place), so I followed the same page‘s hint and edited /etc/vmware/config to say
VSOCK_CONFED = "no"
By the way, I tried to figure out what this module does, and all google results tell you how to tweak and patch. Don’t people care what they do on their computers? Maybe this component is useful?
The following remark was true when this post was published, but no more:
To run VMPlayer under Fedora 12, there’s a need for a little hack, or the VMPlayer closes right after starting:
# cd /usr/lib/vmware/resources/
# mv mozilla-root-certs.crt old-mozilla-root-certs.crt
Have no idea why this is.
VMPlayer networking
The interesting stuff is at /etc/vmware/networking, which pretty much says which interface is NATed and which is host-only. To pick a certain device for bridging, additional lines configuring add_bridge_mapping should be added as explained on this page.
Also useful to play with
# vmware-networks --status
# vmware-networks --start
# vmware-networks --stop
etc. (as root)
Moving an old Linux computer
The mission: Move an old 2.4.21-kernel based computer into a VMPlayer virtual machine. The strategy was to run a LiveCD on the virtual machine, create an empty ext3 file system on it, copy the files and boot. Caveats follow.
The most important lesson learned, is that everything on the new machine has to be done with a kernel with the same or earlier version compared with the one that will be used. In simple words, the rescue disk must run a matching kernel. In particular, with a newer rescue disk, the ext3 is generated with an inode size of 256, which old kernels don’t support. Even worse, even if the file system was generated properly, newer kernels (say, 2.6.35) writes things on the disk that will confuse the old kernel. This leads to “attempt to access beyond end of device” errors during boot, from 01:00 (the ramdisk) as well as 08:01 (the root filesystem’s partition).
So fdisk, mkfs.ext3 and mkinitrd must be done with a matching kernel running. And copying the files too, of course. The rescue disk must match, as mentioned above.
The next thing to note is that all hda’s turn into sda’s. That needs to be adjusted in /etc/fstab as well as /etc/lilo.conf.
The most difficult thing to handle was the fact that SCSI drivers were not installed in the kernel by default, so the initrd image had to be adjusted. So after the file system is copied, mount it, chroot to it, and go
# mkinitrd --with=scsi_mod --with=sd_mod --with=BusLogic /boot/initrd-2.4.21-mykern.vmware.img 2.4.21-mykern
The insmods are attempted in the same order the flags appear, and the two latter modules depend on the scsi_mod. So it’s important to keep the order of the flags as above.
If these modules aren’t loaded, the generation of /dev/sda and /dev/sda1 doesn’t occur, resulting in a kernel panic with various complains (pivot_mount fails, init not found and some more).
And then fix /etc/lilo.conf and /etc/fstab and run lilo. It’s recommended to copy the /boot directory first, so that the kernel image falls within the lower 4 GB. Or pick lba32 option, as lilo will tell you.
And then boot. Mount the ISO image for VMware tools (/dev/hdc) and run ./vmware-install (going with the defaults most of time).
Converting a VMPlayer machine to VirtualBox
Make a copy of the entire directory to a new place. No point messing up things. Then create an OVF file, without converting the disks (because it takes forever, and possibly fails)
$ ovftool --noDisks CleanXP.vmx export.ovf
For some reason, ovftool doesn’t fill in the correct names of the disks (I asked not to convert them, not to ignore them). The simplest way around this is to remove the disk definitions altogether, and import them manually from Virtualbox. For a VM with three disks, the lines marked in red should be removed:
<References>
<File ovf:href="export-disk1.vmdk" ovf:id="file1" ovf:size="0"/>
<File ovf:href="export-disk2.vmdk" ovf:id="file2" ovf:size="0"/>
<File ovf:href="export-disk3.vmdk" ovf:id="file3" ovf:size="0"/>
</References>
<DiskSection>
<Info>Virtual disk information</Info>
<Disk ovf:capacity="10" ovf:capacityAllocationUnits="byte * 2^30" ovf:diskId="vmdisk1" ovf:fileRef="file1" ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" ovf:populatedSize="0"/>
<Disk ovf:capacity="30" ovf:capacityAllocationUnits="byte * 2^30" ovf:diskId="vmdisk2" ovf:fileRef="file2" ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" ovf:populatedSize="0"/>
<Disk ovf:capacity="40" ovf:capacityAllocationUnits="byte * 2^30" ovf:diskId="vmdisk3" ovf:fileRef="file3" ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" ovf:populatedSize="0"/>
</DiskSection>
and also the other references to these vmdiskn:
<Item>
<rasd:AddressOnParent>0</rasd:AddressOnParent>
<rasd:ElementName>disk0</rasd:ElementName>
<rasd:HostResource>ovf:/disk/vmdisk1</rasd:HostResource>
<rasd:InstanceID>7</rasd:InstanceID>
<rasd:Parent>6</rasd:Parent>
<rasd:ResourceType>17</rasd:ResourceType>
</Item>
<Item>
<rasd:AddressOnParent>1</rasd:AddressOnParent>
<rasd:ElementName>disk1</rasd:ElementName>
<rasd:HostResource>ovf:/disk/vmdisk2</rasd:HostResource>
<rasd:InstanceID>8</rasd:InstanceID>
<rasd:Parent>6</rasd:Parent>
<rasd:ResourceType>17</rasd:ResourceType>
</Item>
<Item>
<rasd:AddressOnParent>0</rasd:AddressOnParent>
<rasd:ElementName>disk2</rasd:ElementName>
<rasd:HostResource>ovf:/disk/vmdisk3</rasd:HostResource>
<rasd:InstanceID>10</rasd:InstanceID>
<rasd:Parent>4</rasd:Parent>
<rasd:ResourceType>17</rasd:ResourceType>
</Item>
Delete the export.mf file. It contains export.ovf’s hash signature, and it’s wrong after editing the file.
Then in the VM VirtualBox Manager, pick File > Import Appliance… and choose export.ovf.
Then add the hard disks by choosing Storage > (Diskette Icon) > Add Hard Disk and pick Choose existing disk. Pick the .vmdk file with no number attached to it (not the e.g. *-s011.vmdk).
Enlarge the video memory to 18 MB (or more), or Virtualbox complains that it’s too little. Enable audio.
That was nice so far. Only problem is that Windows required re-activation because too much hardware changed, and the internet activation failed, probably because the NIC wasn’t detected. To install the driver, I’ll need to activate first, and so I was stuck, lost patience and left the whole thing for now.
What I wanted
All I wanted was a software RAID-5 on three disks with a whole disk encryption on Fedora 12. For some reason, I thought the installation script would do that for me.
The relevant part in the installation procedure was kind enough to allow me to set it up in the GUI, but when I went for the installation, I got a window saying “An error was encountered while setting up device sdc1″. sdc1, by the way, is just a plain unencrypted partition. But who cares. I insisted on looking at the “details” where it said “device has not been created”.
Hurray! Now I get it all. Not.
A quick tour in the command line console (I wonder why I always end up doing things with my bare hands) revealed that the partition tables were intact. Simply put, nothing was done.
The setup
The catch about software RAID is that its drivers have to be loaded from somewhere, so obviously a non-RAID boot partition is needed for that. My decision was to allocate ~250 MB on all three disks, exactly the same number of cylinders, and put the boot on one of them. I don’t know why, but it feels right to me that the disks will access the same geometrical points when running as RAID, even thought I’m not forced to do so.
The rest of the partition (around 1000 GB) is allocated as one big software RAID partition. With three disks like this forming a RAID-5 I’ll get one big (fake) ~2TB disk, which will be encrypted completely. On top of that, I’ll put one big LVM physical volume, on which I’ll have 4 GB swap and then a root partition. The precise sizes don’t matter anymore, since I’m under LVM.
Setting up the RAID
Since the LVM tools are not active in Fedora’s rescue mode, I went for booting Ubuntu 9.10 as a LiveCD. The catch is that it doesn’t support neither LVM nor mdadm, so both had to be installed (after setting up a network connection, of course):
# apt-get install lvm2
# apt-get install mdadm
(the latter forcing me to configure postfix. Yuck!)
On /dev/sda: For the boot partition I allocated cylinders 1 to 30. For Raid Autodetect (type 0xfd) I took all the rest. Then I brutally raw copied the first 128 sectors to /dev/sdb and /dev/sdc. That was a bad idea, since the the partition table contains the disk’s GUID. So I cleaned up both disks with some zeros, and ran fdisk on each.
Following this I created a software RAID:
mdadm --create /dev/md0 --level=raid5 --raid-devices=3 --chunk=128 /dev/sda2 /dev/sdb2 /dev/sdc2
And the hard disks started to work. /dev/md0 was up and running pretty much immediately. To monitor the progress:
# mdadm --detail /dev/md0
(yey!)
The whole disk encrypted
# cryptsetup -v luksFormat /dev/md0
After saying “YES” to kill all data and entering my secret passphrase, cryptsetup said all was successful and a window popped up saying that “gvfs-gdu-volume-monitor closed unexpectedly”. How I love when everything is so automated and I don’t need to worry about anything technical.
But who cares? I opened my new secret candy box:
# cryptsetup luksOpen /dev/md0 candybox
and found /dev/mapper/candybox in place (yey II)
Setting up LVM
It’s worth mentioning that there’s an interactive shell-like environment for manipulating LVM volumes. Just go
# lvm
Regardless, following the same HOWTO (more or less) I went
root@ubuntu:/home/ubuntu# pvcreate /dev/mapper/candybox
Physical volume "/dev/mapper/candybox" successfully created
root@ubuntu:/home/ubuntu# vgcreate vg_raid -s 32M /dev/mapper/candybox
Volume group "vg_raid" successfully created
Noted the “-s 32M”? That sets the physical extent size to 32MB instead of the default 4MB. Since the maximal number of extents for a volume is 65534 (more or less…?), and the whole disk is around 2TB, that’s the smallest number which does the work (32 MB x 65534 ~ 2 TB).
OK, now let’s put the swap and boot in place:
root@ubuntu:/home/ubuntu# lvcreate --size 4G vg_raid -n lv_swap
Logical volume "lv_swap" created
root@ubuntu:/home/ubuntu# lvcreate --size 10G vg_raid -n lv_root
Logical volume "lv_root" created
root@ubuntu:/home/ubuntu# ls /dev/mapper/
candybox control vg_raid-lv_root vg_raid-lv_swap
Installing…
To my delight, (and somewhat surprise) the Fedora 12 installation machinery detected both software RAID and all that was underneath, prompted me for my passphrase, and allowed my to allocate the mounting points on the existing logical volumes. Which is the sensible thing to do, but I couldn’t believe it actually happened!
All in all, the installation went smooth, so did the bootup, and everything seems to be OK (fingers crossed).
When bad gets worse
So what happens if a disk suddenly decides to commit suicide? The answer is nothing special. Due to the redundancy, the system will keep on working as usual. Even worse, nobody will be notified (except for an email to root from mdadm). The system just runs on. In a way, that’s good and pretty bad at the same time.
Here’s a typical mail, which is sent to root:
From root@localhost.localdomain Sat Jan 16 17:51:27 2010
Return-Path: <root@localhost.localdomain>
Date: Sat, 16 Jan 2010 17:51:27 +0200
From: mdadm monitoring <root@localhost.localdomain>
To: root@localhost.localdomain
Subject: DegradedArray event on /dev/md0:ocho.localdomain
Status: RO
This is an automatically generated mail message from mdadm
running on localhost.localdomain
A DegradedArray event had been detected on md device /dev/md0.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda2[0] sdc2[3] sdb2[1]
1953037824 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_]
[==================>..] recovery = 94.0% (918267264/976518912) finish=920.7min speed=1053K/sec
unused devices: <none>
It looks like there’s no dedicated software for sounding the alarm. The solution seems to be a simple cronjob script, which runs mdadm every hour or so, and checks if all is OK. The word “degraded” in the “detail” report looks like a good indicator that something isn’t like it should be. My script is at the bottom of this page.
I tried plugging out the spare disk’s SATA cable while the computer was up and running (which is, by all means a violent thing to do). Nothing happened. A few lines in /var/log/messages telling a short story about a disk which doesn’t respond, and the RAID going down to two disks. The log of the boot afterwards (with two disks) is not more dramatic about it. RAID-5, only two disks detected, too bad, let’s go on. The disk is declared “removed” in the “detail” report, and that’s it.
So I turned the computer off, replugged the disk, and turned it on again. The system showed no particular interest in it. To get it back to the RAID array, I did
# mdadm /dev/md0 --add /dev/sdc2
This kicked off the rebuild off this disk. Thinking about it, it’s pretty clever that nothing happens without human intervention. But I’ll consider having the smartd service running.
When worse turns into a catastrophe
Since I was about to wipe my disks soon anyhow, I figured to take the test to the extreme. After all, there’s no point in having a spare disk if it doesn’t work, does it?
So I let the spare disk recover up to 25% (so I know that the relevant disk are is indeed OK, but not letting it finish). Then I pulled disk #2′s SATA plug. So now we have disk #1 which is OK, disk #2 missing, and disk #3 spare but not completely recovered. Don’t try this on real data.
The system lost its stability this time, but it’s not like that connector was intended for hot removal. The attempt to reboot failed with “no root device found”. This is no wonder. I couldn’t really expect the RAID array to rely on one disk and one spare which never got the time to recover, could I? Well, I tried.
So I went for Ubuntu again. Keep in mind that former /dev/sdc2 is now /dev/sdb2. The general music is “everything is clean, but forget it”:
root@ubuntu:/home/ubuntu# mdadm --assemble --scan
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.
root@ubuntu:/home/ubuntu/mnt# mdadm --run /dev/md0
mdadm: failed to run array /dev/md0: Input/output error
root@ubuntu:/home/ubuntu# cat /proc/mdstat
Personalities :
md0 : inactive sda2[0](S) sdb2[3](S)
1953037952 blocks
root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 00.90.00
UUID : fb16d869:ffd27a50:e368bf24:bd0fce41 (local to host ubuntu)
Creation Time : Fri Jan 15 11:11:40 2010
Raid Level : raid5
Used Dev Size : 976518912 (931.28 GiB 999.96 GB)
Array Size : 1953037824 (1862.56 GiB 1999.91 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Jan 15 14:34:58 2010
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 2
Spare Devices : 1
Checksum : 15648d3e - correct
Events : 2500
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 3 8 34 3 spare
0 0 8 2 0 active sync /dev/sda2
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 34 3 spare
But I won’t let this turn me off. There’s a guy who had this kind of problem for real, and was kind enough to document his findings. The bottom line was to tell mdadm to create the RAID array from the start, only assume that everything is already there with “–assume-clean”. Extremely dangerous. I would rawcopy all data to a new hard disk and try it there, if this was for real. But it wasn’t. So I went:
root@ubuntu:/home/ubuntu/mnt# mdadm --create /dev/md0 --assume-clean --level=5 --verbose --chunk=128 --raid-devices=3 /dev/sda2 missing /dev/sdb2
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda2 appears to be part of a raid array:
level=raid5 devices=3 ctime=Fri Jan 15 11:11:40 2010
mdadm: /dev/sdb2 appears to be part of a raid array:
level=raid5 devices=3 ctime=Fri Jan 15 11:11:40 2010
mdadm: size set to 976518912K
Continue creating array? y
mdadm: array /dev/md0 started.
root@ubuntu:/home/ubuntu/mnt# cryptsetup luksOpen /dev/md0 candybox
Enter LUKS passphrase:
key slot 0 unlocked.
Command successful.
But I didn’t get the LVM devices kicked off. So I went:
root@ubuntu:/dev/mapper# dmsetup resume /dev/mapper/candybox
root@ubuntu:/dev/mapper# ls
candybox control vg_raid-lv_root vg_raid-lv_swap
And of course, in real life I would fsck the disk and such. But the bottom line is clear: If the data is there, it’s there.
Summary
Let’s hope I won’t ever need this stuff. Let’s hope that all three disks will live forever. But it’s comforting to know, that if one of those suddenly dies, there is a good chance the whole story will end with the purchase of some hardware. Nothing else.
Sort-of appendix
When the RAID doesn’t come up by itself
If the RAID array is known to be fine, but doesn’t come up:
# mdadm --assemble --scan
which worked under Ubuntu, since it was nice enough to create an /etc/mdadm/mdadm.conf file. Otherwise we need to be more explicit:
# mdadm --assemble /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2
Script for checking the RAID’s health
This is the script I’ve put as a cronjob. Note that it’s completely silent when all is OK, and starts to say things when they’re not. The thing is that the cron daemon sends an email message to whoever it sends (usually root) when there was an output from the cronjob. This doesn’t solve the problem with the email going to root, but it’s a good guard if the mails to root are forwarded to someone attentive.
I’ve also made a log in /var/log/raidlog. The purpose of this log is to allow me to verify that the script is indeed running every now and then. After all, the whole issue is that I don’t expect a hard disk failure tomorrow, but rather when I’ve forgotten about this script altogether. But I hope I’ll have the sense to peek at the log every now and then.
#!/bin/bash
device=/dev/md0
now=`date`
report="";
checkraid() {
mdadm --detail $device | grep State | {
read s;
if ! echo $s | grep -i -q state ; then
echo Problem: mdadm gave bad output for $device
return 1;
fi
if echo $s | grep -i -q degraded ; then
echo Problem: Device $device is degraded
return 1;
fi
return 0;
}
}
if ! checkraid ; then
echo ""
echo mdadm output follows:
echo ""
mdadm --detail $device
echo "Bad RAID at $now" >> /var/log/raidlog
exit 1;
fi
echo "RAID OK at $now" >> /var/log/raidlog
exit 0;
A useless adventure
There is a basic flaw in the above: The LVM is generated on /dev/md0. If we view /dev/md0 as a hard disk, it means it has no partition table!
So somewhere in the middle of the route above, I tried this: With fdisk, I set up an LVM partition on /dev/md0p1
So what I wanted to do was:
# pvcreate /dev/md0p1
# vgcreate lvm-raid -s 32M /dev/md0p1
The difference is that I went for /dev/md0p1 rather than /dev/md0, so that a descent partition table is in place.
But the first one failed with a “/dev/md0p1 not found (or ignored by filtering)” because there is some kernel issue. Or is it? Maybe it’s the whole world telling me I should stop being so fussy.
What I needed was a kernel of 2.6.24 and down, because whoever reported the kernel problem had things running on 2.6.24. I wanted to run an earlier Ubuntu (8.10 instead of 9.10), assuming that it was a kernel issue. I will never know, since that distro got stuck during boot.
So I went for a small rescue distro, namely SystemRescueCD version 1.0.0 (loading altker64, since the default kernel caused a kernel panic). And there I encountered a brand new problem: /dev/md0p1 never appeared in the /dev directory, even through the partition was there. Using mdadm to kick the RAID of did create /dev/md0, but not its subpartitions.
At this point I realized, that even if I manage to get it my way, odds are that not many others did it my way, meaning that nobody really tests things on my settings. In other words, things are expected to go wrong in the long run. Which I why I dropped this.
Introduction
I have an old Mustek 600VA UPS with RS-232 connection. Since I change the battery every couple of years on my own, I find it pretty pointless to throw it away.
And my brand new computer (running Fedora 12) has an RS-232 port if one insists to use a connector on the motherboard. Needless to say, I had to steal the RS-232 cable from motherboard to panel from an old computer.
A simple male/female extension RS-232 cable connects between UPS and computer. PC to UPS communication goes through pin 3, and the other direction through pin 2. The cable should be completely transparent (i.e. not switch wires, in particular not pins 2 and 3).
In case you want to see if the UPS is alive with Putty (or some other terminal), go for 2400 baud, 8N1 (8 bits, no flow control and one stop bit). Type Q1 followed by a carriage return, and the UPS should respond with a line of status info. The protocol is described here.
Linux driver
The right driver to run with the UPS is the Network UPS Tool, or nut for short.
# yum install nut
Configure the UPS.
# vi /etc/ups/ups.conf
Basically, add the following entry:
[PowerMust]
driver = megatec
port=/dev/ttyS0
desc = "Mustek PowerMust 600VA"
mfr = "Mustek"
model = "PowerMust 600"
Now we can kick off the driver which listens to the UPS (the response takes a few seconds):
# upsdrvctl start
Network UPS Tools - UPS driver controller 2.4.1
Network UPS Tools - Megatec protocol driver 1.6 (2.4.1)
Megatec protocol UPS detected.
If you happen to have an oscilloscope on the RS-232 lines (ha!), you should see some action every few seconds from now on.
Or, alternatively, if you want to see what’s running on the computer, I suggest
# ps aux | grep nut
It so happens, that during the installation, we got a new user, named “nut” under which most of the relevant processes are running. A process running /sbin/megatec should be found there.
Configure the monitor
# vi /etc/ups/upsmon.conf
Basically, there’s only one line to add (pretty much at the beginning):
MONITOR PowerMust@localhost 1 upsmon pass master
I’ve chosen to configure the monitor as master, since the I plan to put some virtualization guests, which may work as slaves. As for the “upsmon” and “pass”, these are user and password when connecting to upsd. Since no user nor password were configured in upsd.users, the attempt to login will fail, resulting in error messages when the monitor is started.
Now let’s test it. First, let’s remove the already running daemons:
# service ups stop
Stopping UPS monitor: [FAILED]
Stopping upsd: [ OK ]
Shutting down upsdrvctl: [ OK ]
Stopping the UPS monitor failed, because it wasn’t running. And not kick it off:
# service ups start
Starting UPS driver controller: [ OK ]
Starting upsd: [ OK ]
Starting UPS monitor (master): [ OK ]
# ps aux | grep nut
nut 6578 0.0 0.0 6084 468 ? Ss 02:51 0:00 /sbin/megatec -a PowerMust
nut 6582 0.0 0.0 40552 624 ? Ss 02:51 0:00 /usr/sbin/upsd
nut 6586 0.0 0.0 38364 856 ? S 02:51 0:00 /usr/sbin/upsmon
root 7509 0.0 0.0 102728 780 pts/0 R+ 02:54 0:00 grep nut
Now let’s watch the relevant entries in /var/log/messages:
Jan 14 02:51:48 short megatec[6578]: Startup successful
Jan 14 02:51:48 short upsd[6581]: listening on 127.0.0.1 port 3493
Jan 14 02:51:48 short upsd[6581]: listening on ::1 port 3493
Jan 14 02:51:48 short upsd[6581]: Connected to UPS [PowerMust]: megatec-PowerMust
Jan 14 02:51:48 short upsd[6582]: Startup successful
Jan 14 02:51:48 short upsmon[6585]: Startup successful
Jan 14 02:51:48 short upsd[6582]: User upsmon@::1 logged into UPS [PowerMust]
Jan 14 02:51:48 short upsmon[6586]: Master privileges unavailable on UPS [PowerMust@localhost]
Jan 14 02:51:48 short upsmon[6586]: Response: [ERR ACCESS-DENIED]
The two last lines are a result of the lack of user and password in upsd. But that’s fine. The UPS in monitored even so.
The final step is to activate the service on boot, using chkconfig or some GUI tool (System->Administration->Services on my computer).
Checking out the UPS
Want to grab some info about the UPS? That’s what upsc is for (run while UPS was on battery):
# upsc PowerMust
battery.charge: 60.0
battery.voltage: 12.10
battery.voltage.nominal: 12.0
driver.name: megatec
driver.parameter.mfr: Mustek
driver.parameter.model: PowerMust 600
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/ttyS0
driver.version: 2.4.1
driver.version.internal: 1.6
input.frequency: 50.0
input.frequency.nominal: 50.0
input.voltage: 0.0
input.voltage.fault: 0.0
input.voltage.maximum: 230.5
input.voltage.minimum: 230.0
input.voltage.nominal: 230.0
output.voltage: 230.0
ups.beeper.status: disabled
ups.delay.shutdown: 0
ups.delay.start: 2
ups.load: 14.0
ups.mfr: Mustek
ups.model: PowerMust 600
ups.serial: unknown
ups.status: OB
ups.temperature: 37.8
ups.type: standby
This is yet another bunch of things I wanted written down, in case I need them one day. No certain order here.
The root mount
Grub gives the following kernel parameter:
root=/dev/mapper/vg_short-lv_root
Meaning, that the kernel has the LVM module in place when starting off (I suppose it’s kicked off in the initrd stage)
Boot image
Open a boot image (note that this works like tar -x):
zcat /boot/initramfs-2.6.31.9-174.fc12.x86_64.img | cpio -i
Opening an encrypted partition
[root@short ~]# cryptsetup luksOpen /dev/mapper/vg_short-mysecret mysecret
Enter passphrase for /dev/mapper/vg_short-mysecret:
Key slot 0 unlocked.
[root@short ~]# mount /dev/mapper/mysecret /secret
Note that the second argument, mysecret is the name of the device generated under /dev/mapper. Also note that umounting /secret doesn’t close the partition. In addition to unmounting, there’s also need for another cryptsetup command:
[root@short ~]# umount /secret/
[root@short ~]# cryptsetup luksClose /dev/mapper/mysecret
Copying gigabytes of disk can get the system sluggish. On Linux, the solution is so simple. If process 18898 happens to take control of your disk, just go:
ionice -c 3 -p 18898
And you have your computer back. “-c 3″ means class 3, which is idle class. In other words, take the disk when nobody else asks for it.
I love it. More here.
A short note about installing libraries on an Intel 64 bit machine (Fedora 12 in my case).
It all starts with a short conversation like this one:
[root@short Downloads]# rpm -i VirtualBox-3.1-3.1.2_56127_fedora12-1.x86_64.rpm
error: Failed dependencies:
libQtGui.so.4()(64bit) is needed by VirtualBox-3.1-3.1.2_56127_fedora12-1.x86_64
libQtOpenGL.so.4()(64bit) is needed by VirtualBox-3.1-3.1.2_56127_fedora12-1.x86_64
THE WRONG THING TO DO IS:
[root@short Downloads]# yum install libQtGui.so.4
because it will install the right library, but for 32 bit (note the i686 suffix). There must be an elegant way to come around this. Until I find it out, I’ll go:
[root@short Downloads]# yum whatprovides libQtGui.so.4
Loaded plugins: presto, refresh-packagekit
1:qt-x11-4.5.3-7.fc12.i686 : Qt GUI-related libraries
Repo : fedora
Matched from:
Other : libQtGui.so.4
(this entry possibly duplicated for each repository)
Now we have the name of the 32-bit package, qt-x11-4.5.3-7.fc12.i686 in our case. Just replace i686 with x86_64, and off we go:
[root@short Downloads]# yum install qt-x11-4.5.3-9.fc12.x86_64
It is left as an exercise to explain why yum would load i686 files when running on a x86_64 machine. Tell me if you get the logic.
Another thing is that one can match against file names:
[root@short eli]# yum whatprovides "*libXm.so.*"
Loaded plugins: presto, refresh-packagekit
lesstif-0.95.2-1.fc12.i686 : OSF/Motif library clone
Repo : fedora
Matched from:
Filename : /usr/lib/libXm.so.2
Filename : /usr/lib/libXm.so.2.0.1
Other : libXm.so.2
The (non-) problem
The truth is that there never was a problem. What really happened was that I got things confused between a few versions of mplayer/mencoder, and only the latests (of those I have, 1.0rc1-3.2.2) does the job. Bus since I wrote down some things I might want to return to some day, here’s the whole blob.
What got me on this, was that using mplayer (version 1.0 of some rc’s) to play my Canon 500D’s movie clips, I get the sound OK, but an image which flashes the real thing every now and then (keyframes?) and shows some grey garbage otherwise. And tons of error messages on the console.
On some other version the image looks OK, but A/V sync is lost soon enough. And many error messages indicate that the decoder doesn’t get it right (a lot of “Consumed only 136509 bytes instead of 1365120″ alikes). That isn’t very promising.
It’s worth to note, that mplayer/mencoder choose the native ffmpeg libavcodec by default. As ffmpeg improves, these issues are fixed.
My real goal is to convert the clip to something descent using mencoder. I don’t even think about editing a video in H.264. So all I need now is to find the right video decoder.
But I prefer to use the decoder supplied by Canon (spoiler: I never managed to). Since I own the camera, and got the software legally, why not use the codec they sent me? Only one problem…
What is the DLL of the codec used?
In order to “steal” the codec from the Canon application, I needed to know which DLL Canon uses to play its own videos. In order to do that, I opened Zoombrowser, and ran the ListDLL command line utility (which can be downloaded from here). The utility spits out all DLLs of all processes running, but using the “>” redirection in a command window, all data goes to a file. Then I double-clicked a video, and ran ListDLL again, redirecting the data to another file.
The difference between the files is most probably the DLLs loaded to play a clip. This worked because I ran Zoombrowser from scratch.
With my favourite diff application, I got a long list of new DLLs. These two caught my eyes:
C:\Program Files\Canon\Canon MOV Decoder\CanonH264Filter.ax
C:\Program Files\Canon\Canon MOV Decoder\CanonIPPH264DecLib.dll
Hmmm… In retrospective, I could have figured that one out without heavy tools. But at least I know which they are now.
Installing the codec
I copied both files mentioned above to /usr/local/lib/win32. Then I added the following entry to the /usr/local/etc/mplayer/codecs.conf:
videocodec canonh264
info "Canon's H.264 decoder"
status working
fourcc avc1,AVC1
fourcc h264,H264
fourcc x264,X264
driver dshow
dll "CanonH264Filter.ax"
guid 0xb7215ee3, 0xaf54, 0x433f, 0x9d, 0x2f, 0x22, 0x64, 0x91, 0x69, 0x84, 0xf6
out YUY2
out BGR32,BGR24,BGR15
As for the output formats, I guessed them. Odds are I got it wrong. As for the GUID, I managed to find a class with a “FriendlyName” saying “Canon H.264 Decode Filter 1.3″, and it has the class ID B7215EE3-AF54-433F-9D2F-2264916984F6. So basically that’s it.
Anyhow, this didn’t work at all. When I ran mencoder with -vc canonh264, it ended with
Forced video codec: canonh264
Opening video decoder: [dshow] DirectShow video codecs
Called unk_GetVersionExW
Segmentation fault
I won’t even try to pretend that I understand what went wrong here, but GetVersionExW happens to be a function exported from Windows’ kernel32.dll, retrieving information about the current operating system. I’m not clear on whether the function was never found, or if the decoder wasn’t happy with the answer it got. This way or another, a segfault is a segfault. I thought this was the place to give up. I’ll use the good old ffmpeg decoder.
A remark about H.264
Canon’s choice of H.264 as the encoding format is a bit bizarre, since it’s a version of MPEG-4. And just for general knowledge: MPEG-4 is horrible. In particular it has this annoying thing about stale regions in the frame, which look bad and basically don’t heal. But I suppose that MPEG-2 would require too fast writes to the flash or something. The result is still pretty bad.
Summary
Trying to fix video issues late at night is not necessarily the wisest thing to do.
Introduction
I’m using Xilinx’ MiG 1.7.3 for running DDR2 memories on a Virtex-4 FPGA. It didn’t take me long to realize that the controller never finishes initialization. The problem is that I had no idea of why, and as far as I know, no documentation to refer to in my attempts to understand where the controller got stuck, which is an essential stage in getting it unstuck.
Since Xilinx are wise enough to release the IP core with its source, I was able to reverse engineer the initialization process to the level necessary for my own purpose. This is a memo of the details, just in case I’ll need to do this again some time. I sure hope that won’t be necessary…
In my case, the problem seems to have been overheating of the FPGA. I’m not 100% sure about this, but with 90 degrees centigrade measured on the case, and everything starting to work OK when a descent heatsink (with fan) was put in place, it looks pretty much like good old heat.
Overview
The initialization process consists of several stages. During the entire process, the controller is governed by the init_state one-hot state machine in the ddr2_controller module. The end of this process is marked by init_done_int going high, which goes out as init_done, hence marking the end to the IP core’s user.
The initialization consists of roughly three stages:
- Setting up the memory device
- Setting up the IDELAYs taps so that the DQ inputs are samples with good timing.
- Learning the correct latency for reading data from DQs during read cycles.
Throughout the init process, active and precharge operations take place as required by standard. These operations are not mentioned here, since they don’t add anything to understanding the principle.
Setting up the memory device
This is the normal JEDEC procedure, which includes a preknown sequence of peculiar operations, as defined in the DDR2 standard. This includes writing to the memory’s mode registers. During this phase, the controller will not care if it’s talking to a memory or not, since it never reads anything back from the memory.
Setting up the IDELAYs taps
The importance of this stage is to make sure that data is sampled from DQ lines at the best possible timing. Each DQ input is calibrated separately.
This stage begins with a single write command to column zero. The write data FIFO has already been written some data to it, so that the rising edge contains all ones, and the falling edge is all zeros. For example, for a memory with 16 DQ lines, the FIFO has been fed with 0xFFFF0000 twice for memories with burst length of 4, and four times if the burst length is 8.
This can be seen in the backend_fifos module. In that module, one can see that data is written to the write data FIFO immediately after reset. Also, there is another set of words written to the FIFO, which are intended for the next stage.
All in all, this single write commands drains the FIFO with the words containing all ones or all zeros, so that column zero contains this data. Next the controller reads column zero continuously while adjusting the delay taps to achieve proper input timing for the DQs.
The logic for moving the taps is outside the ddr2_controller module. The latter merely helps by performing reads. When the tap logic finishes, it signals it’s done by raising the signal known as phy_Dly_Slct_Done in the ddr2_controller module, and carries many other names such as SEL_DONE. In the tap_logic module (from which it origins) it’s called tap_sel_done.
The tap calibrator increments the tap delay until the data on that line shifts, or until 55 increments has taken place. Whenever this happens, it’s considered to be the data edge. The tap delay is then decremented by the number of times defined by the tby4tapvalue parameter (17 in my case).
Note that even if no edge is found at all, the tap delay calibrator will consider the calibration of that tap OK.
Here is a short list of lines I found useful to look at with a scope (using the FPGA Editor):
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/calib_done_int
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/tap_sel_done
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_done
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyce_dqs[0]
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyinc_dqs[0]
CHAN_DONE is the most interesting signal, because it goes high briefly every time a data line has finished its tap calibration. Unfortunately, the synthesizer messes up the identification of this signal, so the only way to tell it, is by finding what causes ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int to change state. In my case it was
ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int_not0002
This signal should go high 8 times (or the number of data lines per DQ you have). If it does a fewer numbers and then nothing happens, you can tell which of this data lines is problematic simply by counting these strobes.
Latency for reading data
The purpose of this stage is to tell when, in terms of semiclocks, to sample the data read from the memory. I’m not 100% clear on why this stage is necessary at all, but that won’t change the fact that it exists.
This stage starts with a single write operation again. This time the written data is slightly more sophisticated (keep in mind that it was loaded to the write data FIFO immediately after wakeup from reset). The first column will have the data 0xA written to it, duplicated to occupy all DQs. For example, on a memory with 16 DQs, the first column will be 0xAAAA. The second column is 0x5 duplicated, the third 0x9, and the fourth 0x6, all duplicated. If the burst length is 8, this four word sequence is repeated.
After writing this, the controller reads column zero continously, until COMP_DONE goes high. This signal origins from the pattern_compare8 module, which tells the controller it has recovered the correct input data alignment. More precisely, the rd_data send the ddr2_controller a logical AND of all pattern_compare8′s comp_done signals.
These pattern_compare8 modules simply looks for an 0xAA pattern followed by a 0x99 pattern in the input during rising edges only, or an 0x55 followed by 0x66 on the rising edge. So it will catch the reads of the first and third column, or the second or forth, but either way this solves the alignment ambiguity completely.
As the pattern_compare8 module tries to match the data, it increments (among others) its clk_cnt_rise register (not to be confused with the clk_count_rise wire, which contains the final result). Monitoring clk_cnt_rise[0] (using FPGA Editor, for example) can give a positive feedback that the initialization is at this phase. It should give a nice square wave at half the DDR2 controller’s clk0 frequency, and then stop when this phase is done.
Summary.
The initialization process is not the simplest in the world, and it’s likely to fail if you got anything wrong with your memory, in particular if you have as little as one data wire line miswired. This is not really good news, but understanding the process may help at least understand what went wrong, and hopefully fixing it too.
The whole story began when I decided to be kind enough to tell the Xilinx tools (ISE 9.2 in my case) that the Virtex-4 I’m targeting is a grown-up. Stepping 2, to be precise. I added
CONFIG STEPPING = "2";
to the UCF file. It must have been one of those moments where I believed that the tools do what is best for me.
It wasn’t long before the mapper told me it’s rewarding me with some autocalibration logic for the DCM. Sounded pretty OK. Some logic that will get the DCM back on its feet if the clock stops and returns. Not that I have any such plans. As a matter of fact, I’ve made sure that the DCM will get a reset after any possible messing with the DCM’s clock input.
Both the mapping warning and the docs mention that it’s possible to disable the autocalibration feature in order to save some logic. They never mentioned that the logic can kill the DCM.
And then one of the DCMs started losing lock. I had changed several other things at the same time, so it wasn’t easy to track down why. But it looked so weird: The DCM’s lock flag would go high, and then go down again. The timescale was tens of milliseconds, which is way beyond the response times for a DCM.
My first thought was that it must have something to do with the clock’s signal quality. Maybe some crosstalk. The clock was around 200 MHz. But then I decided to look a bit closer on what this autocalibration was about.
That led me to Answer Record #21435, which was pretty explicit about the reset:
When the input clock returns, the user must manually assert the DCM reset for at least 200 ms to resume proper DCM functionality.
200 ms? So there is was. I did mess with the input clock, but then I sent a brief reset signal to the DCM to get it back to normal. It worked in the past. Not with the extra logic. So all I needed to do, was to add
defparam thedcm.DCM_AUTOCALIBRATION = “FALSE”;
(in the Verilog definition of the DCM) and the problem (which shouldn’t have occured in the first place) was solved.
To make things slightly more annoying, I also had to upgrade the old “DCM” primitives to “DCM_BASE”, because when the “DCM” primitives are upgraded automatically to DCM_ADV’s (by XST), the DCM_AUTOCALIBRATION parameter set to the default, which is “TRUE”. The same parameter simply doesn’t exist for the backward-compatible “DCM” primitive.
Note to self: Remember to disable the autocalibration on all DCMs from now on.