Linux 2.6.35 hanging/oopsing on large memory allocations

Short summary

Kernel 2.6.35.4 kernel on x86_64 doesn’t seem to handle large memory allocations well. In particular,

  • Running malloc() with Gigabyte chunks can cause a kernel oops
  • Quickly allocating all memory will make the system hang (but not oops)

If anyone has an idea why this is, some helpful clues or whatever, please comment below.

I should also mention that I applied the patch, which fixes the general system freeze on rapid disk I/O, mentioned in another post of mine. I didn’t feel like going on crashing my system over and over again, so I skipped the test without this patch.

Allocating a huge chunk

Running a vanilla 2.6.35.4 kernel on a x86_64 machine (as 64-bit Linux), I wanted to see how well my 16 GB of RAM worked. So I decided to allocate a lot of memory and see what happens. More precisely, I wrote the following somewhat dirty program, and ran it (calling it memeater):

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main() {
 long int size = 1024*1024*1024;
 long int i;

 size *= 4;

 char *p = malloc(size);

 printf("Size is %ld, Pointer is %08lx\n", size, (unsigned long int) p);

 if (p)
   for (i=0; i<size/1024; i++) {
     *p = 0;
   p += 1024;
 }

 getc(stdin);

 return 0;
}

A short explanation: This program merely requires 4 Gigabyte of memory in a single malloc(). The loop writes something on each memory page (a 4096 jump rather that 1024 would be OK as well, I believe). This is necessary, since memory allocation doesn’t really consume memory until used.

It then expects the user to press RETURN on console, so that the memory is held until deliberately released.

In theory, this shouldn’t be a problem. Allocating 4 Gigs of memory should either return a pointer or a NULL. Definitely not oops as follows:

May 17 18:11:14 kernel: general protection fault: 0000 [#3] SMP
May 17 18:11:14 kernel: last sysfs file: /sys/devices/virtual/sound/timer/uevent
May 17 18:11:14 kernel: CPU 4
May 17 18:11:14 kernel: Modules linked in: nfsd exportfs it87 hwmon_vid vmnet vmblock vmci vmmon cpufreq_ondemand
 acpi_cpufreq freq_table mperf ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_
intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd iTCO_wdt ppdev 8139too soundcore
 8139cp tulip parport_pc r8169 iTCO_vendor_support pcspkr snd_page_alloc parport i2c_i801 mii sha256_generic
 cryptd aes_x86_64 aes
_generic dm_crypt raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx ata_generic
 pata_acpi pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode]
May 17 18:11:14 kernel:
May 17 18:11:14 kernel: Pid: 3056, comm: memeater Tainted: G      D     2.6.35.4-OCHO3 #1 P55-UD3R/P55-UD3R
May 17 18:11:14 kernel: RIP: 0010:[<ffffffff811024ae>]  [<ffffffff811024ae>] mem_cgroup_charge_statistics+0x9/0x50
May 17 18:11:14 kernel: RSP: 0018:ffff88040a85fa48  EFLAGS: 00010246
May 17 18:11:14 kernel: RAX: 00000000ffffff01 RBX: ffffea000a918e00 RCX: 0000000000000060
May 17 18:11:14 kernel: RDX: 0000000000000000 RSI: ffff8804142c8a00 RDI: ffbfc90001817000
May 17 18:11:14 kernel: RBP: ffff88040a85fa48 R08: ffff880409eba958 R09: 00000000ffffffc0
May 17 18:11:14 kernel: R10: 0000000000400000 R11: ffffea000a9d0df0 R12: 0000000000000001
May 17 18:11:14 kernel: R13: ffff8804142c8a00 R14: ffbfc90001817000 R15: ffff880409fadcc0
May 17 18:11:14 kernel: FS:  00007f36017db700(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
May 17 18:11:14 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 17 18:11:14 kernel: CR2: 0000000001861000 CR3: 0000000001a42000 CR4: 00000000000006e0
May 17 18:11:14 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 17 18:11:14 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 17 18:11:14 kernel: Process memeater (pid: 3056, threadinfo ffff88040a85e000, task ffff880409fadcc0)
May 17 18:11:14 kernel: Stack:
May 17 18:11:14 kernel: ffff88040a85fa98 ffffffff8110579c ffff88040a85faa8 ffffffff810ec2d1
May 17 18:11:14 kernel: <0> ffff88040a85fa88 ffffea000a918e00 00007f35e72e1000 8000000305040067
May 17 18:11:14 kernel: <0> 0000000000099000 ffff880409eba958 ffff88040a85faa8 ffffffff81105826
May 17 18:11:14 kernel: Call Trace:
May 17 18:11:14 kernel: [<ffffffff8110579c>] __mem_cgroup_uncharge_common+0x194/0x1e5
May 17 18:11:14 kernel: [<ffffffff810ec2d1>] ? free_pages_and_swap_cache+0x63/0x80
May 17 18:11:14 kernel: [<ffffffff81105826>] mem_cgroup_uncharge_page+0x27/0x29
May 17 18:11:14 kernel: [<ffffffff810e7267>] page_remove_rmap+0x28/0x50
May 17 18:11:14 kernel: [<ffffffff810dd7b3>] unmap_vmas+0x5c5/0x928
May 17 18:11:14 kernel: [<ffffffff810e2ec1>] exit_mmap+0xce/0x132
May 17 18:11:14 kernel: [<ffffffff8104ad9b>] mmput+0x5e/0xca
May 17 18:11:14 kernel: [<ffffffff8104f1cf>] exit_mm+0x114/0x121
May 17 18:11:14 kernel: [<ffffffff81050b7b>] do_exit+0x226/0x726
May 17 18:11:14 kernel: [<ffffffff8105a08e>] ? try_to_del_timer_sync+0x7b/0x89
May 17 18:11:14 kernel: [<ffffffff810510f8>] do_group_exit+0x7d/0xa5
May 17 18:11:14 kernel: [<ffffffff8105ee27>] get_signal_to_deliver+0x373/0x395
May 17 18:11:14 kernel: [<ffffffff812c4dce>] ? n_tty_read+0x6b3/0x786
May 17 18:11:14 kernel: [<ffffffff81009010>] do_signal+0x72/0x68d
May 17 18:11:14 kernel: [<ffffffff812c758d>] ? tty_ldisc_deref+0xe/0x10
May 17 18:11:14 kernel: [<ffffffff812c025f>] ? tty_read+0x8c/0xc5
May 17 18:11:14 kernel: [<ffffffff81009657>] do_notify_resume+0x2c/0x6e
May 17 18:11:14 kernel: [<ffffffff81009f00>] int_signal+0x12/0x17
May 17 18:11:14 kernel: Code: ff 4c 89 e3 4d 8b 24 24 4c 39 eb 75 de 48 c7 c7 60 29 a6 81 e8 b5 eb 39 00 5b 41 5c 41 5d 41 5e c9
 c3 55 48 89 e5 0f 1f 44 00 00 <48> 8b 87 10 11 00 00 80 fa 01 19 c9 83 c9 01 f6 06 02 48 63 c9
May 17 18:11:14 kernel: RIP  [<ffffffff811024ae>] mem_cgroup_charge_statistics+0x9/0x50
May 17 18:11:14 kernel: RSP <ffff88040a85fa48>
May 17 18:11:14 kernel: ---[ end trace 4d26f08f6051ed51 ]---
May 17 18:11:14 kernel: Fixing recursive fault but reboot is needed

And I should mention that when I tried this for 16 GB, the system just hung. But that’s explained next.

Allocating all RAM

So I said, OK, that must be because nobody is really expected to allocate those huge chunks in one go. So what happens when the memory just ends? I have some swap space, so this should work…

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main() {
 long int size = 1024*1024;
 long int i,k;

 size *= 16;

 for (k=0; k<1024; k++) {

   char *p = malloc(size);

   printf("Size is %ld, Pointer is %08lx\n", size, (unsigned long int) p);

   if (p)
     for (i=0; i<size/1024; i++) {
       *p = 0;
        p += 1024;
     }
 }
 getc(stdin);

 return 0;
}

The idea here is simple: Loop 1024 times, allocating 16MB at a time. This is sane and eventually takes 16 GB.

But no, the system just hung. No oops, no drama. Just nothing happened. Processes were stalled, typing and switching consoles with Shift-Alt-Fx worked. Ctrl-Alt-Delete is ignored. No reboot. Only a reset got me out of this.

For the record, running the program with chunks of 14 MB each, so almost all memory was allocated (14 GB out of 16 GB), worked cleanly, and the system remained stable.

Of course I have swap

And this is the proof: This is my /proc/meminfo with a system barely doing anything:

MemTotal:       16463436 kB
MemFree:        14909616 kB
Buffers:           75428 kB
Cached:           664284 kB
SwapCached:            0 kB
Active:           459044 kB
Inactive:         604272 kB
Active(anon):     324088 kB
Inactive(anon):   115208 kB
Active(file):     134956 kB
Inactive(file):   489064 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:              1348 kB
Writeback:             0 kB
AnonPages:        323592 kB
Mapped:           116232 kB
Shmem:            115704 kB
Slab:              95920 kB
SReclaimable:      46340 kB
SUnreclaim:        49580 kB
KernelStack:        4008 kB
PageTables:        36940 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12426016 kB
Committed_AS:    1576024 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      123416 kB
VmallocChunk:   34359587772 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8064 kB
DirectMap2M:    16764928 kB

Sometimes I’ve sees some of the swap actually used. It’s not like it doesn’t work.

 

Gnome: An icon in the system tray (new mail)

This should have been obvious: Have a “new mail” icon in the system tray when new mail is fetched by Thunderbird on a FC12 Linux machine. I ended up writing my own script.

My starting point was the “Gnome Integration” add-on. Not that I liked that it forced me to have another notification popup on new mails (aside from Thunderbird’s native one) if I wanted the icon. But when an icon was added to the system tray for each new mail arriving (that is, a lot of icons in the tray), I knew I had to do a thing of my own.

So I wrote a small, and not so sophisticated script, which checks if an icon is there (more precisely: If a process owning the icon is running) and gives up if it already exists.

The lucky things about the “Gnome Integration” add-on is that it allows choosing the command for the mail notification. So instead of running whatever puts that unnecessary popup, it runs my script, which puts the icon in place if necessary. Popup away, single icon instead. Could I ask for more?

Here’s the script. I saved it as /usr/local/bin/new-mail-icon (you can put it anywhere you want, but make sure the executable has the same name):

#!/bin/bash

THEDIR=$HOME/.thunderbird
ICON=$THEDIR/green-mail-unread.png
LOCKFILE=$THEDIR/new-mail-icon-process

NOW=`date "+%F %T"`

if [ -e $LOCKFILE ] && grep new-mail-icon /proc/`cat $LOCKFILE`/cmdline > /dev/null 2>/dev/null ;
 then exit 0;
else
 ( echo $BASHPID > $LOCKFILE
 zenity --notification --window-icon=$ICON --text="New mail on $NOW"
 rm -f $LOCKFILE
 ) &
fi

Important notes about the script:

  • It relies on that the script’s name is “new-mail-icon”. This is how it detects another icon exists. If the script has another name, several icons will appear. Or change the argument to the grep command.
  • There is a slight race condition if two of these scripts are run in parallel (which is an unlikely situation, and still). It’s possible that more than one icon will appear, if two (or more) scripts race on the little time gap between checking for the locking file and writing to it.

The script above doesn’t just check if the lockfile exists, but also verifies that the process holding it exists and has the correct command line. This makes it behave correctly if the system shuts down or crashes, leaving the lock file in place.

Then set the Gnome Integration add-on preferences as follows. Note that the “Show icon” is not checked. The icon is created by running my script instead of the notification application. It’s a hack indeed.

Controlling desktop effects: Compiz notes for FC12

Using Fedora Core 12, I wanted some control over my wobbling windows, and in particular which corner is the hot spot for displaying all open windows.

The first thing I needed to find out, is the name of the toolkit doing this: So it’s Compiz.

First, install the Compiz Configuration Setting Manager

# yum install ccsm

It now appears in System > Preferences (as Compiz Configuration Setting Manager).

The next issue is that the settings are ignored. This is because compiz is executed with the gconf plugin by /usr/bin/compiz-gtk, which is what the Gnome environment runs.

So following one of the comments in this bug report, edit /usr/bin/compiz-gtk so that the runCompiz function in the file’s beginning says:

exec compiz --ignore-desktop-hints glib ccp $@

And to get this launched, go to System > Preferences > Desktop effects, turn off the effects and turn them on again (choose “standard” and then “compiz” again). Note that the preferences in the Desktop effect are now ignored, and the ones chosen in Compiz Configuration Setting Manager are used instead. But hey, this is what I wanted in the first place.

In particular, I moved the “hot corner” to the bottom left, which I hit by accident much less.  This is done in Compiz Configuration Setting Manager > Window Management > Application Switcher > Scale > Initiate Window Picker. Maybe I should exchange this with a keystroke instead.

 

Radeon: reserve failed for wait in /var/log/messages

Looking at my system log (kernel 2.6.35.4), this is a common thing:

Apr 24 12:04:49 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait
Apr 24 12:04:49 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait
Apr 24 12:04:51 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait
Apr 24 12:04:51 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait
Apr 24 12:04:57 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait
Apr 24 12:05:08 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait
Apr 24 12:05:14 kernel: radeon 0000:01:00.0: ffff8802d0aa1a00 reserve failed for wait

This doesn’t sound too good, but it looks like it’s harmless, according to this and this. It also looks like the decision was to remove the error message, rather than to fix something else. I suppose this issue is old news pretty soon.

It has been suggested that this error message appears only when compiz (that cool OpenGL toy making all kinds of effects on the desktop).

So just let it be.

No sound on Firefox on x86_64 Linux with Pulseaudio / ALSA

This is yet another suggestion on what to do if the Firefox doesn’t play sound (with flash videos in particular) but all other applications do (mplayer, for example). The video in Firefox runs smoothly, but no sound at all. This is a popular subject on the web, it turns out.

My platform: Fedora Core 12, 64 bit, hda sound card. I should also mention this all was fine on my computer until I upgraded to alsa-1.0.24 kernel drivers from sources. With “make install-modules” (as opposed to “make install”). Even after this installation, I don’t have an ALSA init script on my computer (and neither should I have one, as the drivers are autoloaded anyhow).

The first thing to do is to go to System > Preferences > Sound and pick the “Applications” tab. Now do something that does play sound. The application should be listed there while the sound is played. Stop playing, and the application vanishes from the list.

Now try to play a flash video. If you’re lucky, the Firefox applications appears there (as “ALSA plug-in [firefox-bin]“), only with the sound muted or volume very low. Just fix that, and you’re good.

In my case, nothing appeared on this list when playing a video on Firefox. This means that Firefox doesn’t connect to Pulseaudio at all. This has in fact nothing to do with the flash player: This happens when the video is played in HTML5 mode as well (a common situation in Youtube, for example).

Like many others who have addressed this issue, I know how I solved it, but not why things broke in the first place.

The problem, as it turned out, was that Firefox didn’t find the /usr/lib/alsa-lib/libasound_module_pcm_pulse.so shared library. Note that it’s “lib” and not “lib64″ it was looking for, because the part producing sound runs in 32-bit mode. Or something like that.

So I went

$ yum provides '/usr/lib/alsa-lib/libasound_module_pcm_pulse.so'

Which told me I needed alsa-plugins-pulseaudio-1.0.21-2.fc12.i686. I then went (as root):

# yum install alsa-plugins-pulseaudio-1.0.21-2.fc12.i686

just to be told by yum that “package alsa-plugins-pulseaudio-1.0.22-1.fc12.x86_64 (which is newer than alsa-plugins-pulseaudio-1.0.21-2.fc12.i686) is already installed”. This is when I really love this RPM magic.

Googled for the exact RPM file, downloaded it, and basically went (nothing like the –force flag)

# rpm -i --force alsa-plugins-pulseaudio-1.0.21-2.fc12.i686.rpm

And the sound is back. Hurray! Not. That problem shouldn’t have started in the first place.

Windows: Tracking process launches and ends

I’m sure there’s a saner way to do this than the Perl script below, but jotting it down was so easy, that it wasn’t worth looking for another tool to do it.

The idea is the simplest ever: Grab the list of processes 10 times a second or so, and write a line to standard output when a change is detected. If a process lives longer than ~100 ms, it’s caught.

The correct way to do this would be to use some kind of hook of the Windows operating system. Or more precisely, look for a program that already does that. And then hope that program isn’t some kind of spyware.

So here’s the script. You may need to install Win32::Process::List from somewhere, assuming you have Perl on board anyhow.

Perl code:

#!/usr/bin/perl

use strict;
use warnings;
use Win32::Process::List;

my %old = ();

while (1) { 
  my $P = Win32::Process::List->new();  # constructor

  select(undef, undef, undef, 0.1);

  my $now = scalar localtime time();

  my %list = $P->GetProcesses();

  %old = %list
    unless (%old);

  my %left = %old;

  foreach my $key ( keys %list ) {
    print "START $now: PID $key, $list{$key}\n"
      if (not defined $old{$key});

   delete $left{$key};
  }

  foreach my $key ( keys %left ) {
    print "STOP  $now: PID $key, $left{$key}\n";
  }
  %old = %list;
}

This is what the output can look like (screensaver, then FacebookUpdate and then what happens when I open a new window on Firefox).

> perl processtrace.pl
START Mon Apr 16 22:04:07 2012: PID 1380, sstext3d.scr
STOP  Mon Apr 16 22:04:50 2012: PID 1380, sstext3d.scr
START Mon Apr 16 22:27:00 2012: PID 2144, FacebookUpdate.
STOP  Mon Apr 16 22:27:03 2012: PID 2144, FacebookUpdate.
START Mon Apr 16 22:28:05 2012: PID 2268, jqsnotify.exe
STOP  Mon Apr 16 22:28:07 2012: PID 2268, jqsnotify.exe

 

Windows device drivers: Notes to self

Yet another messy post with just things I wrote down while developing a device driver for Windows. Not necessary coherent, not necessarily accurate.

General notes while playing around

  • “Read the source”, WDK-style: Read the inc\ddk\wdm.h include file, which supplies a lot of information and hints. Sometimes it’s much better than the docs. There are also sources for the implementation of basic functions such as open(), printf() etc. in Visual Studio’s installation directory.
  • A list of status codes can be found here. Possibly relevant in particular: STATUS_UNSUCCESSFUL,  STATUS_NOT_IMPLEMENTED, STATUS_INVALID_PARAMETER, STATUS_NO_MEMORY, STATUS_INVALID_DEVICE_REQUEST, STATUS_DEVICE_BUSY, STATUS_END_OF_FILE, STATUS_ACCESS_DENIED, STATUS_DATA_ERROR, STATUS_NO_SUCH_DEVICE, STATUS_NO_SUCH_FILE, STATUS_OBJECT_NAME_NOT_FOUND,  STATUS_INSUFFICIENT_RESOURCES, STATUS_IO_TIMEOUT, STATUS_FILE_FORCED_CLOSED,  STATUS_CANCELLED, STATUS_NOT_FOUND, STATUS_RETRY, STATUS_RECEIVE_PARTIAL
  • IRP’s major determine the dispatch routine launched (as set by DriverEntry). The minor is interpreted by the dispatch routine itself.
  • A list of safe string functions: In MSDN’s page
  • How to use symbolic links to create a name space: here.
  • To see the internal object structure (and symbolic links), download WinObj from somewhere.
  • Sniff IRPs in the system: Download IrpTracker.
  • MSDN’s explanation on resource rebalancing, why, how and what to do about it.
  • On a typical open for write, DesiredAccess=0x00120196 (FILE_GENERIC_WRITE | READ_CONTROL). On an open for read, DesiredAccess=0x00120089 (FILE_GENERIC_READ | READ_CONTROL). This is based upon command line > and <. Trying to “dir” the file gives DesiredAccess=0x00000080.
  • The difference between IRP_MJ_CLEANUP and IRP_MJ_CLOSE: IRP_MJ_CLEANUP is called when the file is officially closed (all owners of the file handle have closed it). IRP_MJ_CLOSE is called when the file handle can be removed, which is when all outstanding IRPs have been completed as well. Note that IRP_MJ_CLEANUP dispatch routine must walk through the queue and cancel all requests (Why doesn’t the system do this?). Note that non-I/O IRPs for the file handler may still arrive for the file handler.
  • Do implement a handler for IRP_MJ_SET_INFORMATION, so that such an IRP doesn’t fail. Even a yes-yes handler doing nothing but returning success. In particular, Cygwin sends this IRP with FileEndOfFileInformation when writing to a regular file. Go figure.
  • Using MmGetMdlByteCount(irp->MdlAddress) to get the number of bytes to read or write is bad: It causes a bugcheck if the number of bytes to read or write is zero. Which is bad behavior from the application’s side, and still.
  • Not directly relevant, but user space applications with POSIX functions such as open(), read() close() and friends should include <io.h> rather than <unistd.h> which doesn’t exist on VC++.
  • Setting the DeviceType parameter is utterly important, since the native fopen() and open() calls will fail with an “Invalid Argument” error otherwise. The parameter in the call to IoCreateDevice() is ignored since the actual type is taken from the PDO. Hence a line in the INF file for setting up a registry value is necessary.
  • When seeking a file, be either with POSIX-style _seek() and friends, or with native SetFilePointer(), this merely updates CurrentByteOffset, but no IRP is issued, so the device has no idea it happened at all. In particular, no IRP_MJ_SET_INFORMATION IRP with class FilePositionInformation is issued, despite what one could expect. At least so it works in Windows 7.
  • pnpdtest should be run with the verifier (standard Windows utility) configured for toughest tests specifically on the driver. Just type “verifier” at command prompt, reboot, and run pnpdtest. To run verifier with a good set of tests, go
    > verifier /volatile /flags 0xbfb /adddriver mydriver.sys
  • Unlike Linux, the allocation of PCI resources (BAR addresses and interrupts) is not exclusive. As a result, if a device driver doesn’t deallocate the interrupt resource at unload, and hence the pointer to the ISR remains, attempting to reload the driver will cause a jump to the unloaded ISR pointer and a bugcheck. In Linux this wouldn’t happen, because the newly loaded instance of the driver wouldn’t pass the stage of getting the resources.

Runlevels (IRQLs)

A very short summary:

  • PASSIVE_LEVEL: The level of user space applications, system threads and work items specially generated to run in this level. May block, but in most cases it’s not allowed because of arbitrary thread context.
  • APC_LEVEL: The special level for delivering data to user applications. Rarely seen, but often considered.
  • DISPATCH_LEVEL: Used in DPC (Deferred Procedure calls) queued by ISR or by a custom call. Also, when acquiring a spinlock, the runlevel is raised to DISPATCH_LEVEL.
  • Higher levels: When serving interrupts. No kind of mutex can be acquired (neither spinlocks). Some simple memory operations and queuing DPCs is more or less what’s left to do.

Notes:

  • Paged code runs below DISPATCH_LEVEL. So when PAGED_CODE appears in the beginning of a function, it’s at most at APC_LEVEL.
  • DPCs run at DISPATCH_LEVEL. When requested, a single instance is queued and executed when possible. The DPC instance is dequeued just before execution, so if another DPC request occurs immediately after launching a DPC, another run will take place, possibly simultaneously on another CPU. On the other hand, if several requests are made before execution is possible, the DPC is run only once on behalf of these requests.
  • Custom DPCs can be queued from any runlevel. If the “background routine” needs to be queued from DISPATCH_LEVEL and down, use a work item instead.
  • Confusingly enough, most (almost all) IRP dispatch calls are made in passive level. To make things more difficult, read, write and ioctl IRPs can go up to DISPATCH_LEVEL.

Bug check handling

What to do when the Blue Screen (BSOD) pays a visit. Or more precisely, how to catch exactly where your code wanted to use a zero pointer or something.

It’s important to save the entire binaries directory, including .obj files, for the driver distributed, since those files are necessary for bug check dissection.

The files are typically something like:
C:\Windows\Minidump\030312-17503-01.dmp
C:\Users\theuser\AppData\Local\Temp\WER-33571-0.sysdata.xml

Analyze a bugcheck dump (change last file, of course). Be patient. Each step takes time.

> C:\WinDDK\7600.16385.1\Debuggers\kd.exe -n -y srv*c:\symbols*http://msdl.microsoft.com/download/symbols -i C:\devel\objchk_win7_x86\i386 -z C:\copies\022512-24414-01.dmp

The symbols which are downloaded are stored in C:\symbols (as required). When those downloads are finished, go

0: kd> !analyze -v

Quit with “q”

To get a disassembly of the relevant code, go:

> dumpbin.exe /disasm \path\to\objchk_win7_x86\i386\project.obj > project.asm

Note that we disassemble the object file to which the code belongs. This is necessary get symbol names. That alone is a good reason to maintain an exact snapshot of released versions. The .sys file can be disassembled to verify that nothing has changed.

dumpbin.exe is part of the Visual C++ package. The necessary file bunlde are in my resumption directory, containing dumpbin.exe, link.exe, msdis140.dll and mspdb71.dll

List of FPGA boards and IP cores with PCIe/USB and their vendors

I collected some links for my own use (limiting myself to Virtex-5 and later Xilinx FPGAs). Maybe this can help someone else too. This is by no means a complete list, but additions and corrections are welcome in the comment section below (I may delete your comment and update the list, don’t take it personally).

So, in no particular order…

PCI Express IP Cores

PCI Express helper chipsets

USB chipsets

  • Cypress EZ-USB FX2LP (CY7C68013A and friends)
  • FDTI FT2232H UART/FIFO
  • TI’s TUSB series of transceivers. PHY only, without USB’s logical protocol, which is a lot to implement.
  • SMSC’s USB3250 (PHY only as well)

PCI Express boards

The boards listed have native PCIe connection (that is, with no PCIe bridge)

EZUSB boards

Other USB boards

The full list of boards (all types) is here.

Authoritative timestamping: Some silly games

Ever wanted to prove you knew something before everyone else? Here’s the way: You write it down in a file, and timestamp it. Then, when you want to prove you were there first, you’ll show the original file, the signed timestamp, and then everyone know you’re a hopeless geek.

But this can help a bit if something goes wrong with timestamp servers used to sign software.

Anyhow, I have to admit I know very little about this issue. In particular, I have no idea how to dissect the timestamp file (tsr). Ideas are welcome as comments below.

You may need to install a certain package (or several):

# yum install openssl-perl

And this can never hurt:

$ man ts

Now suppose that the file containing our genius knowledge is in thefile.txt.

$ openssl ts -query -data thefile.txt -cert -out request.tsq

This created the request to be sent to the timestamp server. The important part is that the hash of thefile.txt is in there, so the timestamping is specific to the file.

Now read the manual a bit again…

$ man tsget

and go timestamping!

$ /etc/pki/tls/misc/tsget -h http://tsa.starfieldtech.com -o timestamp.tsr request.tsq

Pardon my weird path. This is how yum installed it.

The server mentioned above is GoDaddy’s official timestamp server, so the reply stored in timestamp.tsr is a real proof. If you want to use this in court, expect to pay a lot to experts and hope the judge doesn’t kick you out before you’re done with all the technicalities. But hey, that’s a watertight proof you paid nothing to get (and it’s probably worth accordingly).

It would be wonderful to know what’s actually in that timestamp. I know for a fact that the file’s hash has to be there, as well as the digital signature of the issuer of the time stamp. Lacking a better tool to peek inside, I just went:

$ strings -n 10 timestamp.tsr
20120330152409Z
Starfield Technologies, Inc.1200
)Starfield Class 2 Certification Authority0
110503070000Z
160503070000Z0
Scottsdale1%0#
Starfield Technologies, Inc.1301
*http://certs.starfieldtech.com/repository/1&0$
Starfield Timestamp Authority0

http://ocsp.starfieldtech.com/08

'http://crl.starfieldtech.com/sfroot.crl0S
+https://certs.starfieldtech.com/repository/0
Starfield Technologies, Inc.1200
)Starfield Class 2 Certification Authority
120330152409Z0

So there are some clues here. I dissector to cleartext would be nice, though. Ideas, anyone?

When Cygwin gets fussy about forking a new process

Things like

     3 [main] bash 2216 child_info_fork::abort: data segment start: parent(0x26D000) !=     child(0x38D000)
-bash: fork: retry: Resource temporarily unavailable

It appears to be a recent Red Hat mishap. One of those bugs that happen because of the common belief that if you don’t play around with your software all the time, it will be considered dead.

Workaround: Close all Cygwin programs (and windows of course), find the ash.exe in the /bin directory, and double click it to open a primitive command window.

Go

$ /bin/rebaseall

And the problem is over with. I won’t bother understanding what went wrong, since the explanation is most like not very inspiring in cases like this. More about rebaseall here.