When dovecot silently stops to deliver mails

After a few days being happy with not getting spam, I started to suspect that something is completely wrong with receiving mail. As I’m using fetchmail to get mail from my own server running dovecot v2.2.13, I’m used to getting notifications when fetchmail is unhappy. But there was no such.

Checking up the server’s logs, there were tons of these messages:

dovecot: master: Warning: service(pop3-login): process_limit (100) reached, client connections are being dropped

Restarting dovecot got it back running properly again, and I got a flood of the mails that were pending on the server. This was exceptionally nasty, because mails stopped arriving silently.

So what was the problem? The clue is in these log messages, which occurred about a minute after the system’s boot (it’s a VPS virtual machine):

Jul 13 11:21:46 dovecot: master: Error: service(anvil): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(log): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(ssl-params): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(log): child 1210 killed with signal 9

These three services are helper processes for dovecot, as can be seen in the output of systemctl status:

            ├─dovecot.service
             │ ├─11690 /usr/sbin/dovecot -F
             │ ├─11693 dovecot/anvil
             │ ├─11694 dovecot/log
             │ ├─26494 dovecot/config
             │ ├─26495 dovecot/auth
             │ └─26530 dovecot/auth -w

What seems to have happened is that these processes failed to launch properly within the 30 second timeout limit, and were therefore killed by dovecot. And then attempts to make pop3 connections seem to have got stuck, with the forked processes that are made for each connection remaining. Eventually, they reached the maximum of 100.

The reason this happened only now is probably that the hosting server had some technical failure and was brought down for maintenance. When it went up again, all VMs were booted at the same time, so they were all very slow in the beginning. Hence it took exceptionally long to kick off those helper processes. The 30 seconds timeout kicked in.

The solution? Restart dovecot once in 24 hours with a plain cronjob. Ugly, but works. In the worst case, mail will be delayed for 24 hours. This is a very rare event to begin with.

Critical Warnings after upgrading a PCIe block for Ultrascale+ on Vivado 2020.1

Introduction

Checking Xillybus’ bundle for Kintex Ultrascale+ on Vivado 2020.1, I got several critical warnings related to the PCIe block. As the bundle is intended to show how Xillybus’ IP core is used for simplifying communication with the host, these warnings aren’t directly related, and yet they’re unacceptable.

This bundle is designed to work with Vivado 2017.3 and later: It sets up the project by virtue of a Tcl script, which among others calls the upgrade_ip function for updating all IPs. Unfortunately, a bug in Vivado 2020.1 (and possibly other versions) causes the upgraded PCIe block to end up misconfigured.

This bug applies to Zynq Ultrascale+ as well, but curiously enough not with Virtex Ultrascale+. At least with my setting there was no problem.

The problem

Having upgraded an UltraScale+ Integrated Block (PCIE4) for PCI Express IP block from Vivado 2017.3 (or 2018.3) to Vivado 2020.1, I got several Critical Warnings. Three during synthesis:

[Vivado 12-4739] create_clock:No valid object(s) found for '-objects [get_pins -filter REF_PIN_NAME=~TXOUTCLK -of_objects [get_cells -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GT*E4_CHANNEL_PRIM_INST}]]'. ["project/pcie_ip_block/source/ip_pcie4_uscale_plus_x0y0.xdc":127]
[Vivado 12-4739] get_clocks:No valid object(s) found for '--of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-4739] get_clocks:No valid object(s) found for '--of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]

and another seven during implementation:

[Vivado 12-4739] create_clock:No valid object(s) found for '-objects [get_pins -filter REF_PIN_NAME=~TXOUTCLK -of_objects [get_cells -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GT*E4_CHANNEL_PRIM_INST}]]'. ["project/pcie_ip_block/source/ip_pcie4_uscale_plus_x0y0.xdc":127]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group '. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group '. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]
[Vivado 12-5201] set_clock_groups: cannot set the clock group when only one non-empty group remains. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-5201] set_clock_groups: cannot set the clock group when only one non-empty group remains. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]

The first warning in each group points at this line in ip_pcie4_uscale_plus_x0y0.xdc, which was automatically generated by the tools:

create_clock -period 4.0 [get_pins -filter {REF_PIN_NAME=~TXOUTCLK} -of_objects [get_cells -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GT*E4_CHANNEL_PRIM_INST}]]

And the other at these two lines in pcie_ip_block_late.xdc, also generated by the tools:

set_clock_groups -asynchronous -group [get_clocks -of_objects [get_ports sys_clk]] -group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]]
set_clock_groups -asynchronous -group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]] -group [get_clocks -of_objects [get_ports sys_clk]]

So this is clearly about a reference to a non-existent logic cell supposedly named gen_channel_container[1200], and in particular that index, 1200, looks suspicious.

I would have been relatively fine with ignoring these warnings had it been just the set_clock_groups that failed, as these create false paths. If the design implements properly without these, it’s fine. But failing a create_clock command is serious, as this can leave paths unconstrained. I’m not sure if this is indeed the case, and it doesn’t matter all that much. One shouldn’t get used to ignoring critical warnings.

Looking at the .xci file for this PCIe block, it’s apparent that several changes were made to it while upgrading to 2020.1. Among those changes, these three lines were added:

<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.MASTER_GT">GTHE4_CHANNEL_X49Y99</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.MASTER_GT_CONTAINER">1200</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.MASTER_GT_QUAD_INX">3</spirit:configurableElementValue>

Also, somewhere else in the XCI file, this line was added:

<spirit:configurableElementValue spirit:referenceId="PARAM_VALUE.MASTER_GT">GTHE4_CHANNEL_X49Y99</spirit:configurableElementValue>

So there’s a bug in the upgrading mechanism, which sets some internal parameter to select the a nonexistent GT site.

The manual fix (GUI)

To rectify the wrong settings manually, enter the settings of the PCIe block, and click the checkbox for “Enable GT Quad Selection” twice: Once for unchecking, and once for checking it. Make sure that the selected GT hasn’t changed.

Then it might be required to return some unrelated settings to their desired values. In particular, the PCI Device ID and similar attributes change to Xilinx’ default as a result of this. It’s therefore recommended to make a copy of the XCI file before making this change, and then use a diff tool to compare the before and after files, looking for irrelevant changes. Given that this revert to default has been going on for so many years, it seems like Xilinx considers this a feature.

But this didn’t solve my problem, as the bundle needs to set itself correctly out of the box.

Modifying the XCI file? (Not)

The immediate thing to check was whether this problem applies to PCIe blocks that are created in Vivado 2020.1 from scratch inside a project which is set to target KCU116 (which is what the said Xillybus bundle targets). As expected, it doesn’t — this occurs just on upgraded IP blocks: With the project that was set up from scratch, the related lines in the XCI file read:

<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.MASTER_GT">GTYE4_CHANNEL_X0Y7</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.MASTER_GT_CONTAINER">1</spirit:configurableElementValue>
<spirit:configurableElementValue spirit:referenceId="MODELPARAM_VALUE.MASTER_GT_QUAD_INX">3</spirit:configurableElementValue>

and

<spirit:configurableElementValue spirit:referenceId="PARAM_VALUE.MASTER_GT">GTYE4_CHANNEL_X0Y7</spirit:configurableElementValue>

respectively. These are values that make sense.

With this information at hand, my first attempt to solve this was to add the four new lines to the old XCI file. This allowed using the XCI file with Vivado 2020.1 properly, however synthesizing the PCIe block on older Vivado versions failed: As it turns out, all MODELPARAM_VALUE attributes become instantiation parameters for pcie_uplus_pcie4_uscale_core_top inside the PCIe block. However looking at the source file (on 2020.1), these parameters are indeed defined (only in those generated in 2020.1), and yet they are unused, like many other instantiation parameters in this module. So apparently, Vivado’s machinery generates an instantiation parameter for each of these, even if they’re not used. Those unused parameters are most likely intended for scripting.

So this trick made Vivado instantiate the pcie_uplus_pcie4_uscale_core_top with instantiation parameters that it doesn’t have, and hence its synthesis failed. Dead end.

I didn’t examine the possibility to deselect “Enable GT Quad Selection” in the original block, because Vivado 2017.3 chooses the wrong GT for the board without this option.

Workaround with Tcl

Eventually, I solved the problem by adding a few lines to the Tcl script.

Assuming that $ip_name has been set to the name of the PCIe block IP, this Tcl snippet rectifies the bug:

if {![string equal "" [get_property -quiet CONFIG.MASTER_GT [get_ips $ip_name]]]} {
  set_property -dict [list CONFIG.en_gt_selection {true} CONFIG.MASTER_GT {GTYE4_CHANNEL_X0Y7}] [get_ips $ip_name]
}

This snippet should of course be inserted after updating the IP core (with e.g. upgrade_ip [get_ips]). The code first checks if the MASTER_GT is defined, and only if so, it sets it to the desired value. This ensures that nothing happens with the older Vivado versions. Note the “quiet” flag of get_properly, which prevents it from generating an error if the property isn’t defined. Rather, it returns an empty string if that’s the case, which is what the result is compared against.

Setting MASTER_GT this way also rectifies GT_CONTAINER correctly, and surprisingly enough, this doesn’t change anything it shouldn’t, and in particular, the Device IDs remain intact.

However the disadvantage with this solution is that the GT to select is hardcoded in the Tcl code. But that’s fine in my case, for which a specific board (KCU116) is targeted by the bundle.

Another way to go, which is less recommended, is to emulate the check and uncheck of “Enable GT Quad Selection”:

if {![string equal "" [get_property -quiet CONFIG.MASTER_GT [get_ips $ip_name]]]} {
  set_property CONFIG.en_gt_selection {false} [get_ips $ip_name]
  set_property CONFIG.en_gt_selection {true} [get_ips $ip_name]
}

However turning the en_gt_selection flag off and on again also resets the Device ID to default as with manual toggling of the checkbox. And even though it set the MASTER_GT correctly in my specific case, I’m not sure whether this can be relied upon.

Microsoft Windows: Atomic ops and memory barriers

Introduction

This post examines what the Microsoft’s compiler does in response to a variety of special functions that implement atomic operations and memory barriers. If you program like a civilized human being, that is with spinlocks and mutexes, this is a lot of things you should never need to care about.

I’ve written a similar post regarding Linux, and it’s recommended to read it before this one (even though it repeats some of the stuff here).

To make a long story short:

  • The Interlocked-something functions do not just guarantee atomicity, but also function as a memory barrier to the compiler
  • Memory fences are unnecessary for the sake of synchronizing between processors
  • The conclusion is hence that those Interlocked functions also work as multiprocessor memory barriers.

Trying it out

The following code:

LONG atomic_thingy;

__declspec(noinline) LONG do_it(LONG *p) {
  LONG x = 0;
  LONG y;
  x += *p;
  y = InterlockedExchangeAdd(&atomic_thingy, 0x1234);
  x += *p;

  return x + y;
}

When compiling this as “free” (i.e. optimized), this yields:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 8B 45 08           mov         eax,dword ptr [ebp+8]
  00000008: 8B 10              mov         edx,dword ptr [eax]
  0000000A: 56                 push        esi
  0000000B: B9 34 12 00 00     mov         ecx,1234h
  00000010: BE 00 00 00 00     mov         esi,offset _atomic_thingy
  00000015: F0 0F C1 0E        lock xadd   dword ptr [esi],ecx
  00000019: 8B 00              mov         eax,dword ptr [eax]
  0000001B: 03 C1              add         eax,ecx
  0000001D: 03 C2              add         eax,edx
  0000001F: 5E                 pop         esi
  00000020: 5D                 pop         ebp
  00000021: C2 04 00           ret         4

First thing to note is that InterlockedExchangeAdd() has been translated into a “LOCK XADD”, which indeed fetches the previous value into ECX and stores the updated value into memory. The previous value is stored in ECX after this operation, which is @y in the C code — InterlockedExchangeAdd() returns the previous value.

XADD by itself isn’t atomic, which is why the LOCK prefix is added. More about LOCK below.

What is crucially important to note, is that putting InterlockedExchangeAdd() between the two reads of *p prevents the optimizations of these two reads into one. @p isn’t defined as volatile, and yet it’s read from twice (ptr [eax]).

Another variant, now trying InterlockedOr():

LONG atomic;

__declspec(noinline) LONG do_it(LONG *p) {
  LONG x = 0;
  LONG y;
  x += *p;
  y = InterlockedOr(&atomic, 0x1234);
  x += *p;

  return x + y;
}

Once again, compiled as “free”, turns into this:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 8B 4D 08           mov         ecx,dword ptr [ebp+8]
  00000008: 8B 11              mov         edx,dword ptr [ecx]
  0000000A: 53                 push        ebx
  0000000B: 56                 push        esi
  0000000C: 57                 push        edi
  0000000D: BE 34 12 00 00     mov         esi,1234h
  00000012: BF 00 00 00 00     mov         edi,offset _atomic
  00000017: 8B 07              mov         eax,dword ptr [edi]
  00000019: 8B D8              mov         ebx,eax
  0000001B: 0B DE              or          ebx,esi
  0000001D: F0 0F B1 1F        lock cmpxchg dword ptr [edi],ebx
  00000021: 75 F6              jne         00000019
  00000023: 8B F0              mov         esi,eax
  00000025: 8B 01              mov         eax,dword ptr [ecx]
  00000027: 5F                 pop         edi
  00000028: 03 C6              add         eax,esi
  0000002A: 5E                 pop         esi
  0000002B: 03 C2              add         eax,edx
  0000002D: 5B                 pop         ebx
  0000002E: 5D                 pop         ebp
  0000002F: C2 04 00           ret         4

This is quite amazing: The OR isn’t implemented as an atomic operation, but rather it goes like this: The previous value of @atomic is fetched into EAX and then moved to EBX. It’s ORed with the constant 0x1234, and then the cmpxchg instruction writes the result (in EBX) into @atomic only if its previous value was the same as EAX. If not, the previous value is stored in EAX instead. In the latter case, the JNE loops back to try again.

I should mention that cmpxchg compares with EAX and stores the previous value to it if the comparison fails, even though this register isn’t mentioned explicitly in the instruction. It’s just a thing that cmpxchg works with EAX. EBX is the register to compare with, and it therefore appears in the instruction. Confusing, yes.

Also here, *p is read twice.

It’s also worth noting that using InterlockedOr() with the value 0 as a common way to grab the current value yields basically the same code (only the constant is generated with a “xor esi,esi” instead).

So if you want to use an Interlocked function just to read from a variable, InterlockedExchangeAdd() with zero is probably better, because it doesn’t create all that loop code around it.

Another function I’d like to look at is InterlockedExchange(), as it’s used a lot. Spoiler: No surprises are expected.

LONG atomic_thingy;

__declspec(noinline) LONG do_it(LONG *p) {
  LONG x = 0;
  LONG y;
  x += *p;
  y = InterlockedExchange(&atomic_thingy, 0);
  x += *p;

  return x + y;
}

and this is what it compiles into:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 8B 45 08           mov         eax,dword ptr [ebp+8]
  00000008: 8B 10              mov         edx,dword ptr [eax]
  0000000A: 56                 push        esi
  0000000B: 33 C9              xor         ecx,ecx
  0000000D: BE 00 00 00 00     mov         esi,offset _atomic_thingy
  00000012: 87 0E              xchg        ecx,dword ptr [esi]
  00000014: 8B 00              mov         eax,dword ptr [eax]
  00000016: 03 C1              add         eax,ecx
  00000018: 03 C2              add         eax,edx
  0000001A: 5E                 pop         esi
  0000001B: 5D                 pop         ebp
  0000001C: C2 04 00           ret         4

And finally, what about writing twice to the same place?

LONG atomic_thingy;

__declspec(noinline) LONG do_it(LONG *p) {
  LONG y;
  *p = 0;
  y = InterlockedExchangeAdd(&atomic_thingy, 0);
  *p = 0;

  return y;
}

Writing the same constant value twice to a non-volatile variable. This calls for an optimization. But the Interlocked function prevents it, as expected:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 8B 4D 08           mov         ecx,dword ptr [ebp+8]
  00000008: 83 21 00           and         dword ptr [ecx],0
  0000000B: 33 C0              xor         eax,eax
  0000000D: BA 00 00 00 00     mov         edx,offset _atomic_thingy
  00000012: F0 0F C1 02        lock xadd   dword ptr [edx],eax
  00000016: 83 21 00           and         dword ptr [ecx],0
  00000019: 5D                 pop         ebp
  0000001A: C2 04 00           ret         4

Writing a zero is implemented by ANDing with zero, so it’s a bit confusing. But it’s done twice, before and after the “lock xadd”. Needless to say, these two writes are fused into one by the compiler without the Interlocked statement in the middle (I’ve verified it with disassembly, 32 and 64 bit).

Volatile?

In Microsoft’s definition for the InterlockedExchangeAdd() function (and all other similar ones) is that the first operand is a pointer to a LONG volatile. Why volatile? Does the variable really need to be?

The answer is no, if all accesses to the variable is made with Interlocked-something functions. There will be no compiler optimizations, not on the call itself, and the call itself is a compiler memory barrier.

And it’s a good habit to stick to these functions, because of this implicit compiler memory barrier: That’s usually what we want and need, even if we’re not fully aware of it. Accessing a shared variable almost always has a “before” and “after” thinking around it. The volatile keyword doesn’t protect against reordering optimizations by the compiler (or otherwise).

But if the variable is accessed without these functions in some places, the volatile keyword should definitely be used when defining that variable.

More about LOCK

It’s a prefix that is added to an instruction in order to ensure that it’s performed atomically. In many cases, it’s superfluous and sometimes ignored, as the atomic operation is ensured anyhow.

From Intel’s 64 and IA-32 Architectures Software Developer’s Manual, Volume 2 (instruction set reference) vol. 2A page 3-537, on the “LOCK” prefix for instructions:

Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted.

The manual elaborates further on the LOCK prefix, but says nothing about memory barriers / fences. This is implemented with the MFENCE, SFENCE and LFENCE instructions.

The LOCK prefix is encoded with an 0xf0 coming before the instruction in the binary code, by the way.

Linux counterparts

For x86 only, of course:

  • atomic_set() turns into a plain “mov”
  • atomic_add() turns into “lock add”
  • atomic_sub() turns into “lock sub”
I’m not sure that there are any Windows functions for exactly these.

Is a memory barrier (fences) required?

Spoiler: Not in x86 kernel code, including 32 and 64 bits. Surprise. Unless you really yearn for a headache, this is the right place to stop reading this post.

The theoretical problem is that each processor core might reorder the storing or fetching of data between registers, cache and memory in any possible way to increase performance. So if one one processor writes to X and then Y, and it’s crucial that the other processor sees the change in Y only when it also sees X updated, a memory barrier is often used. In the Linux kernel, smp_wmb() and smp_rbm() are used in conjunction to ensure this.

For example, if X is some data buffer, and Y is a flag saying that the data is valid. One processor fills the buffer X and then sets Y to “valid”. The other processor reads Y first, and if it’s valid, it accesses the buffer X. But what if the other processor sees Y as valid before it sees the data in X correctly? A store memory barrier before writing to Y and a read memory barrier before reading from X ensures this.

The thing is, that the Linux kernel’s implementation of smp_wmb() and smp_rbm() for x86 is a NOP. Note that it’s only the smp_*() versions that are NOPs. The non-smp fences turn into opcodes that implement fences. Assuming that the Linux guys know what they’re doing (which is a pretty safe assumption in this respect) they’re telling us that the view of ordering is kept intact across processor cores. In other words, if I can assure that X is written before Y on one processor, then Intel promises me that on another processor X will be read with the updated value before Y is seen updated.

Looking at how Microsoft’s examples solve certain multithreading issues, it’s quite evident that they trust the processor to retain the ordering of operations.

Memory fences are hence only necessary to ensure the ordering on a certain processor on x86. On different architectures (e.g. ARM v7) smp_wmb() and smp_rbm() actually do produce some code.

When are these fences really needed? From Intel’s 64 and IA-32 Architectures Software Developer’s Manual, Volume 2 (instruction set reference) vol. 2A page 4-22, on the “MFENCE” instruction:

Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction. The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID instruction).

That didn’t answer much. I searched for fence instructions in the disassembly of a Linux kernel compiled for x86_64. A lot of fence instructions are used in the initial CPU bringup, in particular after setting CPU registers. Makes sense. Then there are several other fence instructions in drivers, which aren’t necessarily needed, but who has the guts to remove them?

Most interesting is where I didn’t find a single fence instruction: In any of the object files generated in kernel/locking/. In other words, Linux’ mutexes and spinlocks are all implemented without any fence. So this is most likely a good proof that they aren’t needed for anything else but things related to the CPU state itself. I guess. For a 64-bit x86, that is.

Going back to Microsoft, it’s interesting that their docs for userspace Interlocked functions say “This function generates a full memory barrier (or fence) to ensure that memory operations are completed in order”, but not those for kernel space. Compare, for example InterlockedOr() for applications vs. the same function for kernel. Truth is I didn’t bother to do the same disassembly test for application code.

Some barriers functions

(or: A collection of functions you probably don’t need, even if you think you do)

  • KeFlushWriteBuffer(): Undocumented and rarely mentioned, intended for internal kernel use. Probably just makes sure that the cache has been flushed (?).
  • KeMemoryBarrier(): Calls _KeMemoryBarrier(). But in wdm.h, there’s an implementation of this function, calling FastFence() and LoadFence(). But these are just macros for __faststorefence and _mm_lfence. Looked at next.
  • _mm_lfence() : Turns into an lfence opcode. Same as rmb() in Linux.
  • _mm_sfence(): Turns into an sfence opcode. Same as wmb() in Linux.
  • _mm_mfence(): Turns into an mfence opcode.

I’ve verified that the _mm_*fence() builtins generated the said opcodes when compiled for x86 and amd64 alike. See some experiments on this matter below.

The deprecated _ReadBarrier(), _WriteBarrier() and _ReadWriteBarrier() produce no code at all. MemoryBarrier() ends up as a call to _MemoryBarrier().

Experimenting with fence instructions

(or: A major waste of time)

This is the code compiled:

__declspec(noinline) LONG do_it(LONG *p) {
  LONG x = 0;
  x += *p;
  _mm_lfence();
  x += *p;

  return x;
}

With a “checked compiation” this turns into:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 51                 push        ecx
  00000006: C7 45 FC 00 00 00  mov         dword ptr [ebp-4],0
            00
  0000000D: 8B 45 08           mov         eax,dword ptr [ebp+8]
  00000010: 8B 4D FC           mov         ecx,dword ptr [ebp-4]
  00000013: 03 08              add         ecx,dword ptr [eax]
  00000015: 89 4D FC           mov         dword ptr [ebp-4],ecx
  00000018: 0F AE E8           lfence
  0000001B: 8B 55 08           mov         edx,dword ptr [ebp+8]
  0000001E: 8B 45 FC           mov         eax,dword ptr [ebp-4]
  00000021: 03 02              add         eax,dword ptr [edx]
  00000023: 89 45 FC           mov         dword ptr [ebp-4],eax
  00000026: 8B 45 FC           mov         eax,dword ptr [ebp-4]
  00000029: 8B E5              mov         esp,ebp
  0000002B: 5D                 pop         ebp
  0000002C: C2 04 00           ret         4

OK, this is too much. There is no ptimization at all. So let’s look at the “free” compilation instead:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 8B 45 08           mov         eax,dword ptr [ebp+8]
  00000008: 8B 08              mov         ecx,dword ptr [eax]
  0000000A: 0F AE E8           lfence
  0000000D: 8B 00              mov         eax,dword ptr [eax]
  0000000F: 03 C1              add         eax,ecx
  00000011: 5D                 pop         ebp
  00000012: C2 04 00           ret         4

So clearly, the fence command made the compiler read the value from memory twice, as opposed to optimizing the second read away. Note that there’s no volatile keyword involved. So except for the fence, there’s no reason to read from *p twice.

The exact same result is obtained with _mm_mfence().

Trying with _mm_sfence() yields an interesting result however:

_do_it@4:
  00000000: 8B FF              mov         edi,edi
  00000002: 55                 push        ebp
  00000003: 8B EC              mov         ebp,esp
  00000005: 8B 45 08           mov         eax,dword ptr [ebp+8]
  00000008: 8B 00              mov         eax,dword ptr [eax]
  0000000A: 0F AE F8           sfence
  0000000D: 03 C0              add         eax,eax
  0000000F: 5D                 pop         ebp
  00000010: C2 04 00           ret         4

*p is read into eax once, then the fence, and then it’s added by itself. As opposed to above, where it was read into eax before the fence, then read again into ecx, and then added eax with ecx.

So the compiler felt free to optimize the two reads into one, because the store fence deals only with writes into memory, not reads. Given that there’s no volatile keyword used, it’s fine to optimize reads, which is exactly what it did.

The same optimization occurs if the fence command is removed completely, of course.

For the record, I’ve verified the equivalent behavior on the amd64 target (I’ll spare you the assembly code).

Windows trusting many more Root Authorities than certmgr shows

This baffled me for a while: I used certmgr to see if a Windows 10 machine had a root certificate that was needed to certify a certain digital signature, and it wasn’t listed. But then the signature was validated. And not only that, the root certificate was suddenly present in certmgr. Huh?

Here’s a quick demonstration. This is the “before” screenshot of the Certificate Manager window (click to enlarge):

Windows Certificate Manager before examining .cab file

Looking at the registry, I found 11 certificates in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\AuthRoot\Certificates\ and 12 certificates in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\ROOT\Certificates\, so it matches exactly certmgr’s view of 23 root certificates.

And so I had a .cab file with a certificate that requires Certum’s root certificate for validation. Clear from the screenshot above, it’s not installed.

Then I right-clicked that .cab file, selected Properties, then the “Digital Signature Tab”, selected the certificate and clicked Details, and boom! A new root certificate was automatically installed (click to enlarge):

Windows Certificate Manager after examining .cab file

And suddenly there are 12 certificates in the AuthRoot part of the registry instead of 11. Magic.

And here’s the behind the scenes of that trick.

Microsoft publishes a Certificate Trust List (CTL), which every computer downloads automatially every now and then (once a week, typically). It contains the list of root authorities that the computer should trust, however apparently they are imported into the registry only as needed. This page describes the concept of CTL in general.

I don’t know where this is stored on the disk, however one can download the list and create an .sst file, which opens certmgr when double-clicked. That lists all certificates of the downloaded CTL. 425 of them, as of May 2021, including Certum of course:

> certutil -generateSSTFromWU auth.sst

So it seems like Windows installs certificates from the CRL as necessary to validate certificate chains. This includes the GUI for examining certificates, verifying with signtool, as well as requiring the certificate for something actually useful.

There’s also a utility called CTLInfo out there, which I haven’t tried. It apparently displays the CTL currently loaded in the system, but I haven’t tried it out.

There’s another post in Stackexchange on this matter.

Besides, I’ve written a general post on certificates, if all this sounded like Chinese.

Attestation signing of Windows device driver: An unofficial guide

Introduction

This is my best effort to summarize the steps to attestation signing for Windows drivers (see Microsoft’s main page on this). I’m mostly a Linux guy with no connections inside Microsoft, so everything written below is based upon public sources, trial and (a lot of) error, some reverse engineering, and speculations. This couldn’t be further away from the horse’s mouth, and I may definitely be wrong occasionally (that is, more than usual).

Also, the whole topic of attestation signature seems to be changing all the time, so it’s important to keep in mind that this reflects the situation of May 10th 2021. Please comment below as things change or whenever I got things wrong to begin with.

Attestation signing replaces the method that was available until April 2021, which was signing the driver locally by its author, just like any code signing. With attestation signing, Microsoft’s own digital signature is used to sign the driver. To achieve that, the driver’s .inf and .sys files are packed in a .cab file, signed by the author, and submitted to Microsoft. Typically 10 minutes later, the driver is signed by Microsoft, and can be downloaded back by the author.

Unfortunately, the signature obtained this way is recognized by Windows 10 only. In order to obtain a signature that works with Windows 7 and 8, the driver needs to get through an HLK test.

Signing up to the Hardware Program

This seemingly simple first step can be quite confusing and daunting, so let’s begin with the most important point: The only piece of information that I found present in Microsoft’s output (i.e. their signature add-ons), which wasn’t about Microsoft itself, was the company’s name, as I stated during the enrollment. In other words, what happens during the sign-up process doesn’t matter so much, as long as it’s completed.

This is Microsoft’s general how-to page for attestation signing in general, and this one about joining the hardware program. It wasn’t clear to me from these what I was supposed to do, so I’ll try to offer some hints.

The subscription to the Hardware Program can begin when two conditions are met:

  • You have the capability to sign a file with an Extended Validation (EV) code signing certificate.
  • You have an Azure Active Directory Domain Services managed domain (“Azure AD”).

Obtaining an EV certificate is a bureaucratic process, and it’s not cheap. But at least the other side tells you what to do, once you’ve paid. I went for ssl.com as their price was lowest, and working with them I got the impression that the company has hired people who actually know what they do. In short, recommended.

So what’s the Domain Services thing? Well, this is the explanation from inside the Partner web interface (once it has already been set up): “Partner Center uses Azure Active Directory for identity and access management”. That’s the best I managed to find on why this is necessary.

For a single user scenario, this boils down to obtaining a domain name like something.onmicrosoft.com from Microsoft. It doesn’t matter if the name turns out long and daunting: It doesn’t appear anywhere else, and you’re not going to type it manually.

So here’s what to do: First thing first, create a fresh Microsoft account. Not really necessary if you already have one, but there’s going to be quite some mail going its way (some of which is promotional, unless you’re good at opting out).

Being logged into that account, start off on Azure’s main page. Join the 12-month free trial. It’s free, and yet you’ll need to supply a valid credit card number in the process. As of writing this, I don’t know what happens after 12 months (but see “Emails from Azure calling for an upgrade” below on developments).

The next step is to create that domain service. I believe this is Microsoft’s page on the topic, and this is the moment one wonders why there’s talk about DNSes and virtual networks. Remember that the only goal is to obtain the domain name, not to actually use it.

And here comes the fuzzy part, where I’m not sure I didn’t waste time with useless operations. So you may try following this, as it worked for me. But I can’t say I understand why these seemingly pointless actions did any good. I suspect that the bullets in italics below can be skipped — maybe it’s just about creating an Azure account, and not necessarily allocate resources?

So here are the steps that got me going:

  • Log in to your (new?) Azure account.
  • Go to Azure’s Portal (or click the “Portal” link at the top bar on Azure’s main page)
Maybe skip these steps (?):
  • Click “Create a resource” (at the top left) and pick Azure AD Domain Services.
  • For Resource Group I created a new one, “the_resource_group”. I guess the name doesn’t matter.
  • The DNS name doesn’t matter, apparently. yourcompany.onmicrosoft.com or something. It’s not going to appear anywhere.
  • I set SKU set to Standard, as it appeared to be the least expensive one.
  • After finishing the setup, it took about an hour for Azure to finish the configuration. Time for a long and well deserved coffee break.
  • But then it complained that I need to set up DNSes or something. So I went along with the automatic fix.

(end of possibly useless steps)

  • There’s this thing on the Register for the Hardware Program page saying that one should log in with the Global administrator account. This page defines Azure AD Global administrator as “This administrator role is automatically assigned to whomever created the Azure AD tenant”. So apparently for a fresh Azure account, it’s OK as is.
  • At this point, you’re hopefully set to register to the Hardware Developer Program. After clicking “Next” on the landing page, you’ll be given the choice of “Sign in to Azure AD” or “Create a new directory for free”. The Azure AD is already set up, so log in with the account just created.
  • A word about that “Create a new directory for free” option. To make things even more confusing, this appears to be a quick and painless shortcut, however in my case I got “This domain name is not available” for any domain name I tried with. Maybe I missed something, but this was a dead end for me. This is the page I didn’t manage to get through. Maybe your luck is better than mine. So this is why I created the Azure AD first, and then went for registration.
  • Going on with the registration, you’re given a file to sign with your EV certificate. I got a .bin file, but in fact it had .exe or .sys format. So it can be renamed to .exe and used with cloud signature services (I used eSigner). BUT do this only if you’re going to sign the .cab files with the same machinery, or you’ll waste a few hours wondering what’s going on. Or read below (“When the signature isn’t validated”) why it was wrong in my case.
  • And this is the really odd thing: Inside the Microsoft Partner Center, clicking the “your account” button (at the top right) it shows the default directory in use. At some point during the enrollment procedure, the link with the Azure AD I created was made (?), but for some reason, the default directory shown was something like microsoftazuremycompany.onmicrosoft.com instead of mycompany.onmicrosoft.com, which is the domain I created before. This didn’t stop me from signing a driver. So if another directory was used, why did I create one earlier?

After all this, I was set to submit drivers for signature: From this moment on, the entry point for signing drivers is the Microsoft Partner Center dashboard’s main page.

Emails from Azure calling for an upgrade

To make a long story short, quite a few emails arrived on behalf of Microsoft Azure, urging me to “upgrade” my account, i.e. to allow charging my credit card for Azure services. I ignored them all, and had no issues continuing to sign drivers.

And now to the details.

A day after signing up to Azure, I discovered that $20 had been reduced from my promotional free trial credit. Apparently, I had enabled stuff that would have cost real money. So I deleted the resources I had allocated in Azure. This includes deleting the mycompany.onmicrosoft.com domain, which was obviously ignored by the Partner Center. It was about deleting the the resource group (which contained 7 elements, with the domain included): Just clicking on the resource group in the main portal page, and then Delete Resource Group at the top. It took several minutes for Azure to digest that.

About a month later, I got a notification from Azure urging me to upgrade my account: It went

You’re receiving this email because your free credit has expired. Because of this, your Azure subscription and services have been disabled. To restore your services, upgrade to pay-as-you-go pricing.

Scary, heh? Does “services have been disabled” mean that I’m about to lose the ability to sign drivers?

Once again, “upgrade” is a cute expression for giving permission to charge the credit card that I had to give during submission. The details of which can’t be deleted from the account, unless I submit another, valid one, instead.

As a side note, it turned out that I had a Network Watcher group activated. Maybe I missed it earlier, and maybe it was somehow added. So I deleted it as well. But it’s not clear if this was related to the fact that the credits expired, whatever that means.

A few days on, another mail from Azure, basically the same, urging me to upgrade. One day after that, came an invoice. How come? I haven’t approved any payment. So it was an invoice on 0.00 USD. Zero. Why it was sent to me, is still unclear.

And finally, roughly two months after the initial subscription, I got a “We’re sorry to see you go” email from Azure, saying “Your free credit expired on (this and this date), and because of this we’ve deleted your subscription and any associated data and services”. Uhhm. What about driver signing? Well, I’ve spoiled the suspension above.

Two weeks after getting this last email, I deleted all cookies on my browser that were related to Microsoft, logged into my account at the Partner Center and submitted a driver for signature. The process went through smoothly.

Checking my Azure cloud account, it seemed to had been reset to its starting state, even with a suggestion to start another $200 free trial credit round. Detaching my credit card was however still impossible.

So apparently, there’s no problem just ignoring these emails, and continue signing forever. Emphasis on “apparently”.

Overview of the signature process

To make a long story short, you prepare a .cab file with the driver’s files, sign it with your EV Certificate, upload it to the Hardware Dashboard, and get it back after 10 minutes with Microsoft’s digital signatures all over the place.

So instead of signing the driver yourself, you pack the whole thing neatly, and send it to Microsoft for adding the .cat file and signing the drivers. And yet, you must sign the .cab file to prove that you’re the one taking responsibility for it. It’s Microsoft’s signature on the driver in the end, but they know who to bash if something goes wrong.

.cab files are exactly like .zip files, in the sense that they contain a directory tree, not just a bunch of files. Unfortunately, when looking at .cab files with Windows’ built-in utilities, the directory structure isn’t presented, and it looks like a heap of files. This holds true both when double clicking a .cab file and when using expand -D, from Windows XP all the way to Windows 10. Ironically enough, double-clicking a .cab file with Linux desktop GUI opens it correctly as a directory tree.

It’s important to consider .cab files like .zip, with hierarchy, because the way the driver is submitted is by organizing the files in directories exactly as they appear in the driver package for release, minus the .cat file. So what really happens is that Microsoft uncompresses the .cab file like a .zip, adds the .cat file and then performs the digital signing. It then compresses it all back into a .zip file and returns it back to you. The files remain in the same positions all through.

I guess the only reason .zip files aren’t uploaded instead of .cab, is that signtool doesn’t sign zips.

Some people out there, who missed this point, got the impression that the signing is done for each architecture separately. That’s possible, but there’s no reason to go that way. It’s just a matter of preparing the file hierarchy properly.

Preparing the .cab file

For reference, this is Microsoft’s short page on makecab and very long page on .cab files (which begins with cabarc, but goes on with makecab).

First, set up a .ddf file, looking something like this:

.Set CabinetFileCountThreshold=0
.Set FolderFileCountThreshold=0
.Set FolderSizeThreshold=0
.Set MaxCabinetSize=0
.Set MaxDiskFileCount=0
.Set MaxDiskSize=0
.Set CompressionType=MSZIP
.Set Cabinet=on
.Set Compress=on

;Specify file name for new cab file
.Set CabinetNameTemplate=thedriver.cab
.Set DiskDirectoryTemplate= ; Output .cab files into current directory

.Define pathtodriver=thedriver-dir

.Set DestinationDir=thedriver
%pathtodriver%\thedriver.inf
.Set DestinationDir=thedriver\i386
%pathtodriver%\i386\thedriver.sys
.Set DestinationDir=thedriver\amd64
%pathtodriver%\amd64\thedriver.sys

The .cab file is then generated with something like

> makecab /f thedriver.ddf

“makecab” is in Window’s execution path by default.

In my case of transitioning from self-signed drivers to attestation signature, there was already a script that generated the directory ready for releasing the driver. So the change I made was not to copy the .cat file into that directory, and instead of signing the .cat file, create a .cab.

The .ddf file above relates to a driver released for Intel architecture, 32 and 64 bits. The subdirectories in the driver package are i386 and amd64, respectively, as defined in the .inf file.

Changes you should make to the .ddf file:

  • Replace all “thedriver” with the name of your driver (i.e. the name of the .inf and .sys files).
  • Set “pathtodriver” to where the driver package is. Note that makecab’s /d flag allows setting variables, so the Define directive can be removed, and instead go something like
    > makecab /d pathtodriver=..\driverpackage thedriver.ddf
  • Then possibly modify the files to be included. Each DestinationDir assignment tells makecab the directory position to place the file(s) that appear after it. This should match the structure of your release package’s directory structure.
  • If the line doesn’t start with a dot, it’s the path to a file to copy into the .cab file. The path can be absolute (yuck) or relative to the current directory.

All in all, the important thing is to form a directory tree of a driver for release in the .cab file.

The .ddf file shown above is a working example, and it includes only the .inf and .sys files. Including a .cat file is pointless, as Microsoft’s signature machinery generates one of its own.

As for .pdb files, it’s a bit more confusing: Microsoft’s main page includes .pdb files in the list of “typical CAB file submissions” (actually, .cat it listed too there), and then these files don’t appear in the .ddf file example on the same page. The graphics showing a tree for multiple package submissions is inconsistent with both.

A .pdb files contains the symbol map of the related .sys file, allowing the kernel debugger to display meaningful stack traces and disassemblies, in particular when analyzing a bugcheck. These files are not included in a driver release, not mentioned in the .inf file, not referenced in the .cat file and are therefore unrelated to the signature of the driver. Technically, Microsoft doesn’t need these files to complete an attestation signature.

Microsoft nevertheless encourages submitters of drivers to include .pdb files. When these file are missing in a driver submission, a popup shows up in the web interface saying “This submission does not include symbols. It is recommended to upload symbols within each driver folder”. This however doesn’t stop the process, and not even delay it, in case you’re slow on confirming it. So it’s up to you if you want to include .pdb’s.

Submitting the .cab file

The command for signing the .cab file is:

> signtool.exe sign /fd sha256 thedriver.cab

Note that timestamping is not required, but won’t hurt. The whole idea with timestamping is to make the signature valid after the certificates expire, but the .cab file is examined soon after the signature is made, and after that it has no more importance.

Note that ssl.com also offers an eSigner tool for signing the .cab file with a simple web interface. Just be sure to have registered with a signature made in eSigner as well, or things will go bad, see “When the signature isn’t validated” below. Or add eSigner’s certificate to the existing subscription.

Then the submission itself:

  • In Microsoft Partner Center’s dashboard, click at “Drivers” on the left menubar. It might be necessary to click “Hardware” first to make this item appear.
  • Click the “Submit new hardware” button at the top left to get started.
  • Give the submission a name — it’s used just for your own reference, and surely won’t appear in the signed driver package.
  • Drag the signed cab file to where it says to.
  • The web interface requires selecting Windows releases in a lot of checkboxes. More on this just below.
  • Click “Submit” to start the machinery. Once it finishes, successfully or not, it sends a notification mail (actually, three identical mails or so. Not clear why not only one).
  • If and when the entire process is completed successfully, the driver can be downloaded: Under “Packages and signing properties”, there’s a “More” link. Click it, and a button saying “Download signed files” appears. So click it, obviously.

Now to the part about selecting Windows versions. It’s an array of checkboxes. This is a screenshot of this stage (click to enlarge):

Selecting OS targets for Attestation Signing

First, the easy part: Don’t check the two at the top saying “Perform test-signing for X”. It says “Leave all checkboxes blank for Attestation Signing” in fine print above these.

Now we’re left with a whole lot of Windows 10 release numbers and architectures. From a pure technical point of view, there’s no need for this information to perform the signature, since the .inf file contains the information of which architectures are targeted.

Rather, this is the “attestation” part: Just above the “Submit” button, it says “You have completed quality testing of your driver for each of the Operating Systems selected above”. So this is where you testify which platforms you’ve tested the driver with. The deal is that instead of going through the via dolorosa of HLK tests, Microsoft signs the driver for you in exchange to this testimony. Or should I say, attestation.

Just to spell it out: The signature can’t and doesn’t limit itself to specific operating system builds, and it would be insane doing so, as it wouldn’t cover future Windows releases.

I have to admit that in the beginning I misunderstood this part, and tried to select as much as possible. And because my driver wasn’t compiled for arm64, and I had clicked versions saying “ARM64″, the submission was rejected with “thedriver.inf does not have NTARM64 decorated model sections” (in UniversalLog.txt). It was bit of a computer game to check the right boxes and avoid the wrong ones.

So no need to be greedy. Common sense is to test the driver on one operating system release for each architecture. In the example above, it’s for a driver released for Intel architecture, 32 and 64 bits. The checkbox selection reflects testing it with Windows 10 release 1607, 32- and 64-bit architecture. This is the proper way to go.

And yet, for the heck of it I tried submitting the same driver package with a single OS checked (1607 x64). To my surprise, the package was accepted and signed despite my declaration that it hadn’t been tested for the 32-bit version, even though a .sys file for that architecture was part of the package.

All in all, there must be a match between the architectures targeted by the driver (as listed in the .inf file) and those inferred by the selection of OSes. Nevertheless, it seems like Microsoft lets you get away with not testing all of them. In short, checking just one checkbox may be enough, even if the driver supports multiple architectures.

Looking at the signed zip

After receiving back the signed driver, I examined the files. My findings were:

  • The .inf file is left completely intact (bytewise identical to the one in the .cab file).
  • A signed .cat file was added.
  • All .sys files were signed as well (contrary to what most of us do when releasing drivers). This makes the driver eligible for inclusion during boot.

Looking at the digital signatures with an ASN.1 dump utility, it’s appears like the only place there’s something not pointing at Microsoft, is an non-standard spcSpOpusInfo entry in the crypto blob, where the company’s name, appears in wide char format in the programName field (no, I’m not mistaken). This appears to be taken from the “Publisher display name” as it appears in the Account Settings in the Microsoft Partner Center dashboard.

So all in all, there are almost no traces to the fact that the driver’s origin isn’t Microsoft. Except for that entry in the crypto blob, which is most likely invisible unless the signature is analyzed as an ASN.1 file or string searched (with a tool that detects wide char strings). So it appears like all information, except for that “Publisher display name” remains between you and Microsoft.

When the signature isn’t validated

Sometimes, the process fails at the “Preparation” stage. As always on failures, the web interface suggest downloading a “full error report”. That report is a file named UniversalLog.txt file. If it says just “SignatureValidationFailed”, something went wrong with the signature validation.

The solution for this is to make sure that the certificate that was used for signing the .cab file is registered: Within Microsoft Partner Center, click the gear icon at the top right, select “Account Settings” and pick “Manage Certificates” at the left menu bar. That’s where the relevant certificate should be listed. The first time I got to this page, I saw the same certificate twice, and deleted one of those.

In my case the problem was that during the registration, I had made the signature with the cloud app (eSigner), but signed the driver with a local USB key dongle. As it turned out, these have different certificates.

So the solution was to delete the registered certificate from the account, and register the new one by signing a file with the local USB dongle. Doing this is a good idea in any case, because if something is wrong with the signature produced by signtool, it will fail the registration as well. So whether this renewed registration succeeds or fails, it brings you closer to the solution.

Sample certificate chains

For reference, these are examples of certificate chains: One properly signed .cab file and one for a the .cat file that has been attestation signed my Microsoft.

Note the /pa flag, meaning Default Authenticode Verification Policy is used. Or else verification may fail. Also note that the file isn’t timestamped, which is OK for submission of attestation signing.

> signtool verify /pa /v thefile.cab

Verifying: thefile.cab

Signature Index: 0 (Primary Signature)
Hash of file (sha256): 388D7AFB058FEAE3AEA48A2E712BCEFEB8F749F107C62ED7A41A131507891BD9

Signing Certificate Chain:
    Issued to: Certum Trusted Network CA
    Issued by: Certum Trusted Network CA
    Expires:   Mon Dec 31 05:07:37 2029
    SHA1 hash: 07E032E020B72C3F192F0628A2593A19A70F069E

        Issued to: SSL.com EV Root Certification Authority RSA R2
        Issued by: Certum Trusted Network CA
        Expires:   Mon Sep 11 02:28:20 2023
        SHA1 hash: 893E994B9C43100155AE310F34D8CC962096AE12

            Issued to: SSL.com EV Code Signing Intermediate CA RSA R3
            Issued by: SSL.com EV Root Certification Authority RSA R2
            Expires:   Wed Mar 22 10:44:23 2034
            SHA1 hash: D2953DBA95086FEB5805BEFC41283CA64C397DF5

                Issued to: THE COMPANY LTD
                Issued by: SSL.com EV Code Signing Intermediate CA RSA R3
                Expires:   Fri May 03 13:09:33 2024
                SHA1 hash: C15A6A7986AE67F1AE4B996C99F3A43F98029A54

File is not timestamped.

Successfully verified: thefile.cab

Number of files successfully Verified: 1
Number of warnings: 0
Number of errors: 0

One possibly confusing situation is to check if the root certificate exists before ever running this verification on a fresh Windows installation. It may not be there, but then the verification is successful, and the root certificate appears from nowhere. That rare situation is explained in this post.

Next up is the attestation signed .cat file:

> signtool.exe verify /kp /v thedriver.cat

Verifying: thedriver.cat

Signature Index: 0 (Primary Signature)
Hash of file (sha256): ED5231781724DEA1C8DE2B1C97AC55922F4F85736132B36660FE375B44C42370

Signing Certificate Chain:
    Issued to: Microsoft Root Certificate Authority 2010
    Issued by: Microsoft Root Certificate Authority 2010
    Expires:   Sat Jun 23 15:04:01 2035
    SHA1 hash: 3B1EFD3A66EA28B16697394703A72CA340A05BD5

        Issued to: Microsoft Windows Third Party Component CA 2014
        Issued by: Microsoft Root Certificate Authority 2010
        Expires:   Mon Oct 15 13:41:27 2029
        SHA1 hash: 1906DCF62629B563252C826FDD874EFCEB6856C6

            Issued to: Microsoft Windows Hardware Compatibility Publisher
            Issued by: Microsoft Windows Third Party Component CA 2014
            Expires:   Thu Dec 02 15:25:28 2021
            SHA1 hash: 984E03B613E8C2AE9C692F0DB2C031BF3EE3A0FA

The signature is timestamped: Mon May 10 03:10:15 2021
Timestamp Verified by:
    Issued to: Microsoft Root Certificate Authority 2010
    Issued by: Microsoft Root Certificate Authority 2010
    Expires:   Sat Jun 23 15:04:01 2035
    SHA1 hash: 3B1EFD3A66EA28B16697394703A72CA340A05BD5

        Issued to: Microsoft Time-Stamp PCA 2010
        Issued by: Microsoft Root Certificate Authority 2010
        Expires:   Tue Jul 01 14:46:55 2025
        SHA1 hash: 2AA752FE64C49ABE82913C463529CF10FF2F04EE

            Issued to: Microsoft Time-Stamp Service
            Issued by: Microsoft Time-Stamp PCA 2010
            Expires:   Wed Jan 12 10:28:27 2022
            SHA1 hash: AAE5BF29B50AAB88A1072BCE770BBE40F55A9503

Cross Certificate Chain:
    Issued to: Microsoft Root Certificate Authority 2010
    Issued by: Microsoft Root Certificate Authority 2010
    Expires:   Sat Jun 23 15:04:01 2035
    SHA1 hash: 3B1EFD3A66EA28B16697394703A72CA340A05BD5

        Issued to: Microsoft Windows Third Party Component CA 2014
        Issued by: Microsoft Root Certificate Authority 2010
        Expires:   Mon Oct 15 13:41:27 2029
        SHA1 hash: 1906DCF62629B563252C826FDD874EFCEB6856C6

            Issued to: Microsoft Windows Hardware Compatibility Publisher
            Issued by: Microsoft Windows Third Party Component CA 2014
            Expires:   Thu Dec 02 15:25:28 2021
            SHA1 hash: 984E03B613E8C2AE9C692F0DB2C031BF3EE3A0FA

Successfully verified: thedriver.cat

Number of files successfully Verified: 1
Number of warnings: 0
Number of errors: 0

Doing the same with the .sys file yields exactly the same result, with slight and meaningless differences in the timestamp.

Clearly, the certificate chain ends with “Microsoft Root Certificate Authority 2010″ rather than the well-known “Microsoft Code Verification Root”, which is the reason the attestation signature isn’t recognized by Windows 7 and 8.

Microsoft as a Certificate Authority, approving itself all through the chain. It’s quite odd this happened only now.

Generation of a certificate request from an existing P12 certificate

The goal

The envisioned work flow for certificate generation is that the end user requests a certificate from a CA by first generating a public / private key pair, and then sending a request for having the public key certified by the CA. This way, the CA is never exposed to the private key.

This is contrary to the common procedure today, where the end user gets the private key from the CA, mostly because the requirement is often that the private key must be on an external hardware device, out of reach even to the end user itself.

Because of the original vision of the flow, openssl’s way of generating a certificate is in two steps: First, create a request file, which contains the public key and the Subject information. The second step takes the request file as input, and generates a certificate, using the secret key of the CA, plus the related CA certificate, so that its data is copied into the generated certificate’s information about the Issuer.

But what if I already have a certificate, and I want another one, for the exact same public key and the same Subject? This post is about exactly that, when the previous certificate is in .p12 format.

For a general tutorial on certificates, there’s this post.

Steps

Extract information from existing certificate:

$ openssl pkcs12 -in my-certificate.p12 -nodes -out oldcert.pem

This command prompts for the password of the secret key in the .p12 file, and then creates a PEM file with two sections: One for the certificate, and one for the secret key. Note the -nodes argument, which outputs the secret key without password protection. Makes the process easier, but obviously riskier as well.

To watch the certificate part that was extracted in textual format:

$ openssl x509 -in oldcert.pem -text

Inspired by this page, generate an CSR with:

$ openssl x509 -x509toreq -in oldcert.pem -out CSR.csr -signkey oldcert.pem

Note that cert.pem is used twice: Once as the reference for creating a CSR, and once for grabbing the key. I’m prompted for the password again, because the private key is opened. (I used the “key to happiness” one).

The CSR.csr contains some textual information as well as a PEM formatted part, which is the one to submit. So I copied the file into clean.csr, and manually deleted everything but the PEM segment. And checked it:

$ openssl req -text -in clean.csr -noout -verify

The output should make sense (correct requested name etc.).

Now delete oldcert.pem, as it contains the secret key in cleartext!

Crypto jots on ASN.1 and Microsoft’s .cat files

Intro

Crypto is not my expertise. This is a pile of jots I wrote down as I tried to figure out what the Microsoft catalogue file is all about. Not-so-surprising spoiler: It appears to be organized and elegant at first glance, but the more you look into it, it’s a mess. Of the kind that’s caused by someone making a quick hack to solve that little problem very urgently. And repeat.

Sources

  • My own introduction to certificates.
  • C code for parsing ASN.1 (dumpasn1.c) can be found on this page. Be sure to download dumpasn1.cfg as well.
  • A JavaScript ASN.1 parser from this site.
  • For the PKCS#7 syntax, see this page.
  • The osslsigncode utility (written in plain C) is not just useful for analyzing signed CATs, but is also boilerplate code for manipulations based upon openSSL’s API.

ASN.1 basics

ASN.1 is a protocol for organizing (usually small) pieces of data into a file in a structured and hierarchical manner. For each type of container (e.g. an x.509 certificate), there’s a protocol describing its format, written in syntax somewhat similar to C struct definitions. Exactly like C struct definitions, it may contain sub-structures. But don’t take this analogy too far, because ASN.1 definitions have optional fields, and fields with unknown number of members.

So if you want to follow what everything means, you need the definition for each element that you encounter. These are sometimes defined in the protocol for the relevant container type. Just like C structs, just looking at the data tells you what’s an integer and what’s a string, but their meaning depends on the order they appeared.

And just like a struct may contain other structs, there are objects in ASN.1. These are the catch-all method for inserting elements with arbitrary form.

When an object is encountered, it always has an object ID (OID), which defines its class. In that case, the format and meaning of what’s encapsulated is defined in the object’s definition. It may not be published (e.g. OIDs specific to Microsoft). Note that an OID defines the format of the object, but not necessarily its meaning. Even though OIDs that are used in very specific cases also tell us what they contain.

A few random notes that may help:

  • The common binary format is DER. Each item is encoded with a one-byte identifier (e.g 0x30 for SEQUENCE) followed by the length of the item (including everything in lower hierarchies): Below 0x80 it’s the length given as a single byte, otherwise it’s given in Big Endian format. The first byte is the number of bytes to define the length + 0x80, and then it’s the length. After this comes the data.
  • Because the length of the item is given explicitly, there’s a lot of freedom for types like INTEGER: It can be a single byte or huge numbers.
  • The SEQUENCE item, as its name implies, is a sequence of elements which encapsulates some kind of information. Some fields are optional. In that case, there’s a number in square brackets, e.g. [0], [1], [2] etc. in the format specification. These number in square brackets appear in the parsed output as well, to indicate which optional field is displayed (if there are any).
  • Object identifiers are given in dotted format, e.g. 1.2.840.113549.1.7.2 for signedData. In dumasn1′s output, they appear with spaces, e.g.
    OBJECT IDENTIFIER signedData (1 2 840 113549 1 7 2)

    The translation from numeric OID to a meaningful name is possible if dumpasn1 happens to have that OID listed, which is sometimes not the case. Either way, when encountering these, just Google them up for more information.

  • There are two ASN.1-defined types for describing time: UTCTime (tag 0x17) and GeneralizedTime (tag 0x18). They can be used interchangeably. Curiously enough, both are given as ASCII strings. Looking for these is helpful for finding certificates in a large blob, as well as time stamps.

DER and PEM formats

In practice, there are two formats out there for crypto data: DER, which is the native binary format, and PEM, which is a base64 representation of a DER binary. The reason I consider DER to be “native” is that when a digital signature is made on a chunk of data, it’s the DER representation of the a chunk of ASN.1 segment that is hashed.

Openssl can be used to convert between the two formats. As openssl’s default format is PEM, use -inform DER and -outform DER as necessary to convince it into playing ball.

For example, converting a certificate from PEM to DER

$ openssl x509 -in mycert.crt -outform DER -out mycert.der

or in the opposite direction:

$ openssl x509 -inform DER -in mycert.der -out mycert2.crt

As for a .cat file (or any other file in DER PKCS#7 format), the same goes

$ openssl pkcs7 -inform DER -in thedriver.cat -out thedriver.pem

and back from PEM to DER:

$ openssl pkcs7 -in thedriver.pem -outform DER -out thedriver.cat

It’s somewhat daunting that there’s no catch-all converter from DER to PEM, so there’s a need to know which kind of crypto creature is converted. The identification is however necessary, as there are headers indicating the type of data in both DER and PER. It would just have been nicer to have this done automagically.

Inspecting ASN.1 files

The absolute winner for dumping blobs is dumpasn1.c. Download it from the link given above, compile it and install it in /usr/local/bin/ or something. Be sure to have dumpasn1.cfg in the same directory as the executable, so that OBJECT IDENTIFIER items (OIDs) get a human-readable string attached, and not just those magic numbers.

Then go

$ dumpasn1 -t thedriver.cat | less

Note that dumpasn1 expects DER format. See above if you have a PEM.

Flags:

  • -t for seeing the text of strings
  • -hh for long hex dumps
  • -u for seeing timestamps as they appear originally

For those looking for instant gratification, there’s an online JavaScript parser, asn1js. In fact, the site allows downloading the HTML and JavaScript sources, and point the browser local files.

And then there’s openssl’s own dumper, which produces data that is less useful for human interaction. Its only real advantage is that it’s most likely already installed. Go something like this (or drop the -inform DER for parsing a PEM file):

$ openssl asn1parse -inform DER -i -in thedriver.cat -dump

Attempting to parse a DER file without the -inform DER flag, the result may be “Error: offset out of range”. It’s a misleading error message, so don’t fall for this one.

The certificates in a .cat file

For a general tutorial on certificates, see this post.

To extract (hopefully all) certificates included in a .cat (or any other PKCS#7) file, in cleartext combined with PEM format, go

$ openssl pkcs7 -inform DER -print_certs -text -in thedriver.cat -out the-certs.txt

A Windows .cat file is just a standard PKCS#7 file, which is a container for signed and/or encrypted data of any sort. The idea behind this format apparently is to say: First, some information to apply the signature on. Next, here are a bunch of certificates that will help to convince the validator that the public key that is used for the validation should be trusted. This part is optional, but it typically contains all certificates that are necessary for the certificate validation chain, except for the root certificate (which validation software mustn’t accept even if it’s present, or else the validation is pointless). And after the (optional) certificate section comes the signature on the content of the first part — the data to sign.

In some situations, signtool improvises a bit on where to put the certificates for validation, in particular those needed for validating the timestamp, and if a second signature is appended. This is contrary to the straightforward approach of putting all and any certificate in the dedicated PKCS#7 section, as discussed below. The question is whether one is surprised that Microsoft diverged from the standard or that it adopted a standard format to begin with.

The consequence of stashing away certificates in less expected places is that openssl utilities that the command above for extracting certificates from a .cat file may miss some of those. The only real way to tell is to look at an ASN.1 dump.

Finding the digital signature in a non-cat file

Signtool creates digital signatures in a similar way even for non-cat files. The quick way to find it is by looking for the following hex sequence (with e.g. “hexdump -C”):

30 82 xx xx 06 09 2a 86 48 86 f7 0d 01 07 02

The part marked in read is the part saying “OBJECT IDENTIFIER 1.2.840.113549.1.7.2″ which means a SignedData object. Even though this data structure is supposed to contain the data it signs, signtool often appends it to the data for signature. Non-standard, but hey, this is Microsoft.

Pay attention to the last bytes of this sequence rather than the first ones. There are other similar OIDs, but the difference is in the last bytes.

The reason I’ve added the four bytes before is that these are the beginning of the SEQUENCE, which the signature always begins with. The 0x82 part means that the two following bytes contain the length of the current chunk (in big Endian). For snipping out the signature, include these four bytes, to conform the PKCS#7 format.

I should also mention that there might be a SignedData object inside the outer SignedData object, due to signtool’s obscure way of including timstamps and/or multiple signatures. In principle, the outer one is the one to process, but it might also make sense to look at the inner object separately, in particular for extracting all certificates that are included.

To create a file that begins with the ASN.1 data, go something like this (if the 30 82 starter appeared in the hexdump at 0xc408):

$ dd if=thedriver.sys of=theblob.bin skip=$((0xc408)) bs=1

.cat file dissection notes

The root object has signedData OID, meaning that it follows the following format:

SignedData ::= SEQUENCE {
  version           Version,
  digestAlgorithms  DigestAlgorithmIdentifiers,
  contentInfo       ContentInfo,
  certificates      [0]  CertificateSet OPTIONAL,
  crls              [1]  CertificateRevocationLists OPTIONAL,
  signerInfos       SignerInfos
}

I won’t go into the depth of each element. To make a long story short, there are three main elements:

  • The contentInfo part, containing the data to be signed (a Microsoft catalogList item with file names, their hashes and more). If the CAT file isn’t signed (yet), this is the only part in the file. Note that catalogList contains the timestamp of the .cat file’s creation.
  • The certificate part containing a list of certificates, which relate to the direct signature (as well as the timestamp on some versions of signtool). This is just a bunch of certificates that might become useful while evaluating the signature. As mentioned above and below, signtool sometimes puts them in signerInfos instead.
  • The signerInfos part, containing a list of signatures on the data in the contentInfo part. But there’s always only one signature here. The timestamp is embedded into this signature. And even if a signature is “appended” with signtool’s /as flag, the additional signature isn’t added to this set, but obscurely shoved elsewhere. See below.

The signature is in the end, as a SignerInfos item.

SignerInfos ::= SET OF SignerInfo

SignerInfo ::= SEQUENCE {
  version                    Version,
  signerIdentifier           SignerIdentifier,
  digestAlgorithm            DigestAlgorithmIdentifier,
  authenticatedAttributes    [0]  Attributes OPTIONAL,
  digestEncryptionAlgorithm  DigestEncryptionAlgorithmIdentifier,
  encryptedDigest            EncryptedDigest,
  unauthenticatedAttributes  [1]  Attributes OPTIONAL
}

It’s easy to spot it in the dump as something like

8805  931:       SET {
8809  927:         SEQUENCE {
8813    1:           INTEGER 1
8816  135:           SEQUENCE {

towards the end.

Curiously enough, signerIdentifier is defined as

SignerIdentifier ::= CHOICE {
  issuerAndSerialNumber  IssuerAndSerialNumber,
  subjectKeyIdentifier   [2]  SubjectKeyIdentifier
}

and what it typically found is issuerAndSerialNumber. In other words, the details of the certificate which confirms the public key (i.e. its serial number) appear in this section, and not those of the signer. The only part that relates to the signer is the serial number.

So in essence, the textual parts in SignerIdentifier should essentially be ignored. To start the certificate chain, begin from the serial number and climb upwards.

The timestamp appears as unauthenticatedAttributes, and is identified as a countersignature (1.2.840.113549.1.9.6) or Ms-CounterSign (1.3.6.1.4.1.311.3.3.1):

9353  383:           [1] {
9357  379:             SEQUENCE {
9361    9:               OBJECT IDENTIFIER
         :                 countersignature (1 2 840 113549 1 9 6)

Just like the signature, it’s given in issuerAndSerialNumber form, so the textual info belongs to the issuer of the certificate. The only informative part is the serial number.

Notes:

  • Public keys are typically identified by their serial numbers. This is the part that connects between the key and related certificates.
  • Serial numbers appear just as “INTEGER” in the dump, but it’s easy to spot them as strings of hex numbers.
  • In the ASN.1 data structure, certificates convey the information on the issuer first and then subject. It’s somewhat counterintuitve.

Looking at the data part of a .cat file

The truth is that dissecting a .cat file’s ASN.1 blob doesn’t reveal more than is visible from the utility that pops up when one clicks the .cat file in Windows. It’s just a list of items, one part consists of the files protected by the catalogue, and the second some additional information (which is superfluous).

For a Windows device driver, the files covered are the .sys and .inf files. In Window’s utility for inspecting .cat files, these files appear under the “Security Catalog” tab. Each file is represented with a “Tag” entry, which is (typically? Always?) the SHA1 sum of the file. Clicking on it reveals the attributes as they appear in the .cat file, among others the thumbprint algorithm (sha1) and the value (which coincides with the “Tag”, just with spaces). Even more interestingly, the File attribute is the file name without the path to it.

In other words, the catalogue doesn’t seem to protect the position of the files in the file hierarchy. The strategy for validating a file seems therefore to be to calculate its SHA1 sum, and look it up in the catalogue. If there’s a match, make sure that the file name matches. But there’s apparently no problem moving around the file in the file hierarchy.

Under the “General” tab of the same utility, there are hex dumps of the DER-formatted data each object, with the OID (all 1.3.6.1.4.1.311.12.2.1) given in the “Field” column for each. The information here is the operating system and the Plug & Play IDs (Vendor & Product IDs) that the driver covers. Which is redundant, since this information is written in the .inf file, which is protected anyhow. That may explain why the presentation of this info in the utility is done so horribly bad.

Multiple signatures

When an additional signature has been added by virtue of signtool’s /as flag (“append signature”), it’s added as an OID_NESTED_SIGNATURE (1.3.6.1.4.1.311.2.4.1) item in the unauthenticatedAttributes, after the timestamp signature of the original signature:

 9740  8031:             SEQUENCE {
 9744    10:               OBJECT IDENTIFIER '1 3 6 1 4 1 311 2 4 1'
 9756  8015:               SET {
 9760  8011:                 SEQUENCE {
 9764     9:                   OBJECT IDENTIFIER
           :                     signedData (1 2 840 113549 1 7 2)

Under this there’s a signedData item (i.e. the same OID as the one encapsulating the entire file), containing no data by itself, but does contain a bunch of certificates, a signature (on what?) and a timestamp, apparently with some Microsoft improvisations on the standard format.

So they basically said, hey, let’s just push another PKCS#7 blob, from beginning to end, minus the CAT data itself, in that place where anyone can do whatever he wants. The correct way would of course have been to add another SignerInfo item to the SignerInfos set, but hey, this is Microsoft.

The takeaway is that this is likely to cause problems, as hacks always do. And not just for us who want to analyze what’s in there. My response to this is to publish two separate driver files if needed, and stay away from these double signatures.

Checking Windows signatures in Linux

For this there’s opensslsigncode. I cloned a copy, and compiled at commit ID c0d9569c4f6768d9561978422befa4e44c5dfd34. It was basically:

$ ./autogen.sh
$ ./configure
$ make

It seemed to complain about curl not being installed, but it was this that was actually needed:

# apt install libcurl4-openssl-dev

Copied osslsigncode to /usr/local/bin/, and then I could check a Windows driver catalog file with

$ osslsigncode verify -in thedriver.cat

The important thing is that it prints out a neat summary of the certificates in the file. Less informative than using openssl to extract the certificates as shown above, and more descriptive than openssl’s output. However the version I tried crashed when faces with a driver with double signatures. Not sure who to blame.

A sledge hammer introduction to X.509 certificates

Introduction

First and foremost: Crypto is not my expertise. This is a note to future self for the next time I’ll need to deal with similar topics. This post summarizes my understanding as I prepared worked on a timestamp server, and it shows the certificates used by it.

There are many guides to X.509 certificates out there, however it seems like it’s common practice to focus on the bureaucratic aspects (a.k.a. Public Key Infrastructure, or PKI), and less on the real heroes of this story: The public cryptographic keys that are being certified.

For example, RFC 3647 starts with:

In general, a public-key certificate (hereinafter “certificate”) binds a public key held by an entity (such as person, organization, account, device, or site) to a set of information that identifies the entity associated with use of the corresponding private key. In most cases involving identity certificates, this entity is known as the “subject” or “subscriber” of the certificate.

Which is surely correct, and yet it dives right into organization structures etc. Not complaining, the specific RFC is just about that.

So this post is an attempt to make friends with these small chunks of data, with a down-to-earth, technical approach. I’m not trying to cover all aspects nor being completely accurate. For exact information, refer to RFC 5280. When I say “the spec” below, I mean this document.

Let’s start from the basics, with the main character of this story: The digital signature.

The common way to make a digital signature is to first produce a pair of cryptographic keys: One is secret, and the second is public. Both are just short computer files.

The secret key is used in the mathematical operation that constitutes the action of a digital signature. Having access to it is therefore equivalent to being the person or entity that it represents. The public key allows verifying the digital signature with a similar mathematical operation.

A certificate is a message (practically — a computer file), saying “here’s a public key, and I hereby certify that it’s valid for use between this and this time for these and these uses”. This message is then digitally signed by whoever gives the certification (with is a key different from the one certified, of course). As we shall see below, there’s a lot more information in a certificate, but this is the point of it all.

The purpose of a certificate is like ID cards in real life: It’s a document that allows us to trust a piece of information from someone we’ve never seen before and know nothing about, without the possibility to consult with a third party. So there must be something about this document that makes it trustworthy.

The certificate chain

Every piece of software that works with public keys is installed with a list of public keys that it trusts. Browsers carry a relatively massive list for SSL certificates, but for kernel code signing it consists of exactly one certificate. So the size of this list varies, but is surely very small compared with the number of certificates out there are in general. Keys may be added and removed to this list in the course of time, but its size remains roughly the same.

The common way to maintain this list is by virtue of root certificates: These certificates basically say “trust me, this key is OK”. I’ll get further into this along with the example of a root certificate below.

As the secret keys of these root certificates are precious, they can’t be used to sign every certificate in the world. Instead, these are used to approve the key of another certificate. And quite often, that second certificate approves the key for verifying a third certificate. Only that certificate approves the public key which the software needs to know if it’s OK for use. In this example, these three certificates form a certificate chain. In real life, this chain usually consists of 3-5 certificates.

In many practical applications (e.g. code signing and time stamping) the sender of the data for validation also attaches a few certificates in order to help the validating side. Likewise, when a browser establishes a secure connection, it typically receives more than one certificate.

None of these peer-supplied certificates are root certificates (and if there is one, any sane software will ignore it, or else is the validation worthless). The validating software then attempts to create a valid certificate chain going from its own pool of root certificates (and possibly some other certificates it has access to) to the public key that needs validation. If such is found, the validation is deemed successful.

The design of the certificate system envisions two kinds of keys: Those used by End Entities for doing something useful, and those used by Certificate Authorities only for the purpose of signing and verifying other certificates. Each certificate testifies which type it belongs to in the “Basic Constraints” extension, as explained below.

Put shortly: The certificates that we (may) pay a company to make for us, are all End Entities certificates.

In this post I’ll show a valid certificate chain consisting of three certificates.

A sample End Entity certificate

This is a textual dump of a certificate, obtained with something like:

openssl x509 -in thecertificate.crt -text

Other tools represent the information slightly differently, but the terminology tends to remain the same.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            65:46:72:11:63:f1:85:b4:3d:95:3d:72:66:e6:ee:c5:1c:f6:2b:6e
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Time Stamping CA
        Validity
            Not Before: Jan  1 00:00:00 2001 GMT
            Not After : May 19 00:00:00 2028 GMT
        Subject: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Time Stamping Service CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:d7:34:07:c5:dd:f5:e6:6a:b2:9e:e6:76:e3:ce:
                    af:33:a3:10:60:97:e8:27:f1:62:87:90:a9:21:52:
[ ... ]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage: critical
                Time Stamping
            X509v3 Subject Key Identifier:
                3A:E5:43:A1:40:3F:A4:0F:01:CE:D3:3F:2A:EE:4E:92:B9:28:5C:3A
            X509v3 Authority Key Identifier:
                keyid:3C:F5:43:45:3B:40:10:BC:3F:25:47:18:10:C4:19:18:83:8C:09:D0
                DirName:/C=GB/ST=Gallifrey/L=Gallifrey/O=Dr Who/CN=Dr Who Root CA
                serial:7A:CF:23:8D:2E:A7:6C:84:52:53:AF:BA:D7:26:7F:54:53:B2:2D:6B

    Signature Algorithm: sha256WithRSAEncryption
         6c:54:88:55:ff:c7:e1:81:73:4e:00:80:46:0d:dc:d9:32:c1:
         53:ba:ff:f9:32:e4:f3:83:c2:29:bb:e5:91:88:8e:6f:46:f4:
[ ... ]

The key owning the certificate

Much of the difficulty to understand certificates stems from the fact that the bureaucratic terminology is misleading, making it look as if it was persons or companies that are certified.

So let’s keep the eyes on the ball: This is all about the cryptographic keys. There’s the key which is included in the certificate (the Subject’s public key) and there’s the key that signs the certificate (the Authority’s key). There are of course someones owning these keys, and their information is the one that is presented to us humans first and foremost. And yet, it’s all about the keys.

The certificate’s purpose is to say something about the cryptographic key which is given explicitly in the certificate’s body (printed out in hex format as “Modulus” above). On top of that, there’s always the “Subject” part, which the human-readable name given to this key.

As seen in the printout above, the Subject is a set of attributes and values assignments. Collectively, they are the name of the key. Which attributes are assigned differs from certificate to certificate, and it may even contain no attributes at all. The meaning of and rules for setting these has to do with the bureaucracy of assigning real-life certificates. From a technical point of view, these are just string assignments.

Usually, the most interesting one is CN, which stands for commonName, and is the most descriptive part in the Subject. And yet, it may be confusingly similar to that of other certificates.

For certificates which certify an SSL key for use by web server, the Subject’s CN is the domain it covers (possibly with a “*” wildcard). It might be the only assignment. For example *.facebook.com or mysmallersite.com.

Except for some root certificates, there’s a X509v3 Subject Key Identifier entry in the certificate as well. It’s a short hex string which is typically the SHA1 hash of the public key, or part of it, but the spec allows using other algorithms. It’s extremely useful for identifying certificates, since it’s easy to get confused between Subject names.

I’ll discuss root authorities and root certificates below, along with looking at a root certificate.

The key that signed the certificate

Then we have the “Authority” side, which is the collective name for whoever signed the certificate. Often called Certificate Authority, or CA for short. Confusingly enough (for me), it appears before the Subject, in the binary blob of the certificate as well as the text output above.

The bureaucratic name of this Authority is given as the”Issuer“. Once again, it consists of a set of attributes and values assignments, which collectively are the name of the key that is used to sign the certificate. This tells us to look for a certificate with an exact match: The exact same set of assignments, with the exact same values. If such issuer certificate is found, and we trust it, and it’s allowed to sign certificates, and the Issuer’s public key validates the signature of the Subject’s certificate — plus a bunch of other conditions — then the Subject certificate is considered valid. In other words, the public key it contains is valid for the uses mentioned in it. This said with lots of fine details omitted.

But looking for a certificate in the database based upon the name is inefficient, as the same entity may have multiple keys and hence multiple certificates for various reasons — in particular because a certificate is time limited. To solve this, all certificates (except root certificates) must point at their Authority with the X509v3 Authority Key Identifier field (though I’ve seen certificates without it). There are two methods for this:

  1. The value of that appears in the Subject Key Identifier field, in the certificate for the key that signed the current certificate (so it’s basically a hash of the public key that signed this certificate).
  2. The serial number of the certificate of the key that signed the current certificate, plus the Issuer name of the Authority’s certificate — that is the Authority that is two levels up in the foodchain. This is a more heavy-weight identification, and gives us a hint on what’s going on higher up.

The first method is more common (and is required if you want to call yourself a CA), and sometimes both are present.

Anyhow, practically speaking, when I want to figure out which certificate approves which, I go by the Subject / Authority Key Identifiers. It’s much easier to keep track of the first couple of hex octets than those typically long and confusing names.

Validity times and Certificate serial number

These are quite obvious: The validity time limits the time period for which the certificate can be used. The validating software uses the computer’s clock for this purpose, unless the validated message is timestamped (in particular with code signing), in which case the timestamp is used to validate all certificates in the chain.

The serial number is just a number that is unique for each certificate. The spec doesn’t define any specific algorithm for generating it. Note that this number relates to the certificate itself, and not to the public key being certified.

The signature

All certificates are signed with the secret key of their Authority. The public key for verifying it is given in the Authority’s certificate.

The signature algorithm appears at the beginning of the certificate, however the signature itself is last. The spec requires, obviously, that the algorithm that is used in the signature is the one stated in the beginning.

The signature is made on the ASN.1 DER-encoded blob which contains all certificate information (except for the signature section itself, of course).

X509v3 extensions

Practically all certificates that are used today are version 3 certificates, and they all have a section called X509v3 extensions. In this section, the creator of the certificate insert data objects as desired (but with some minimal requirements, as defined in the spec). The meaning and structure of each data object is conveyed by an Object Identifier (OID) field at the header of each object, appearing before the data in the certificate’s ASN.1 DER blob. It’s therefore possible to push any kind of data in this section, by assigning an OID for that kind of data.

In addition to the OID, each such data object also has a boolean value called “critical”: Note that some of the extensions in the example above are marked as critical, and some are not. When an extension is critical (the boolean is set true) the certificate must be deemed invalid if the extension is not recognized by its verifying software. Extensions that limit the usage of a certificate are typically marked critical, so that unintended use doesn’t occur because the extension wasn’t recognized.

I’ve already mentioned two x509v3 extensions: X509v3 Subject Key Identifier and X509v3 Authority Key Identifier, none of which are critical in the example above. And it makes sense: If the verifying software doesn’t recognize these, it has other means to figure out which certificate goes where.

So coming up next is a closer look at a few standard X509v3 extensions.

X509v3 Key Usage

As its name implies, this extension defines the allowed uses of the key contained in the certificate. A certificate that is issued by a CA must have this extension present, and mark it Critical.

This extension contains a bit string of 8 bits, defining the allowed usages as follows:

  • Bit 0: digitalSignature — verify a digital signature other than the one of a certificate or CRL (these are covered with bits 5 and 6).
  • Bit 1: nonRepudiation (or contentCommitment) — verify a digital signature in a way that is legally binding. In other words, a signature made with this key can’t be claimed later to be false.
  • Bit 2: keyEncipherment — encipher a private or secret key with the public key contained in the certificate.
  • Bit 3: dataEncipherment — encipher payload data directly with the public key (rarely used).
  • Bit 4: keyAgreement — for use with Diffie-Hellman or similar key exchange methods.
  • Bit 5: keyCertSign — verify the digital signature of certificates.
  • Bit 6: cRLSign — verify the digital signature of CRLs.
  • Bit 7: encipherOnly — when this and keyAgreement bits are set, only enciphering data is allowed in the key exchange process.
  • Bit 8: decipherOnly — when this and keyAgreement bits are set, only deciphering data is allowed in the key exchange process.

X509v3 Extended Key Usage

The Key Usage extension is somewhat vague about the purposes of the cryptographic operations. In particular, when the public key can be used to verify digital signature, surely not all kinds of signatures? If this was the case, this would make the public key valid to sign anything (that isn’t a legal document, a certificate and a CRL, and still).

On the other hand, how can a protocol properly foresee any possible use of the public key? Well, it can’t. Instead, each practical use of the key is given a unique number in the vocabulary of Object Identifiers (OIDs). This extension merely lists the OIDs that are relevant, and this translates into allowed uses. When evaluating the eligibility to use the public key (that is contained in the certificate), the Key Usage and Extended Key Usage are evaluated separately; a green light is given only if both evaluations resulted in an approval.

The spec doesn’t require this extension to be marked Critical, but it usually is, or what’s the point. The spec does however say that “in general, this extension will appear only in end entity certificates”, i.e. a certificate that is given to the end user (and hence with a key that can’t be used to sign other certificates). In reality, this extension is often present and assigned the intended use in certificates in the middle of the chain, despite this suggestion. As I’ve seen this in code signing and time stamping middle-chain certificates, maybe it’s to restrict the usage of this middle certificate for certain purposes. Or maybe it’s a workaround for buggy validation software.

This is a short and incomplete list of interesting OIDs that may appear in this extension:

  • TLS Web Server Authentication: 1.3.6.1.5.5.7.3.1
  • TLS Web Client Authentication: 1.3.6.1.5.5.7.3.2
  • Code signing: 1.3.6.1.5.5.7.3.3
  • Time stamping: 1.3.6.1.5.5.7.3.8

The first two appear in certificates that are issued for HTTPS servers.

X509v3 Basic Constraints

Never mind this extension’s name. It has nothing to do with what it means.

This extension involves two elements: First, a boolean value, “cA”, meaning Certificate Authority. According to the spec, the meaning of this flag is that the Subject of the certificate is a Certificate Authority (as opposed to End Entity). When true, the key included in the certificate may be used to sign other certificates.

But wait, what about the keyCertSign capability in X509v3 Key Usage (i.e. bit 6)? Why the duplicity? Not clear, but the spec requires that if cA is false, then keyCertSign must be cleared (certification signature not allowed). In other words, if you’re not a CA, don’t create certificates that can sign other certificates.

This flag is actually useful for manually analyzing a certificate chain going from the end user certificate towards the root: Given a pile of certificates, look for the one with CA:FALSE. That’s the certificate to start with.

The second element is pathLenConstraint, which usually appears as pathlen in text dumps. It limits the number of certificates between the current one and the final certificate in the chain, which is typically the End Entity certificate. Commonly, pathlen is set to zero in the certificate that authorizes the certificate that someone paid to get.

If there’s a self-issued certificate (Subject is identical to Issuer) in the chain which isn’t the root certificate, forget what I said about pathlen. But first find me such a certificate.

This extension is allowed to be marked critical or not.

X509v3 Subject Alternative Name

(not in the example above)

This extension allows assigning additional names, on top of the one appearing in Subject (or possibly instead of it). It’s often used with SSL certificates in order to make it valid for multiple domains.

A sample non-root CA certificate

This is the text dump of the certificate which is the Authority of the example certificate listed above:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            7a:cf:23:8d:2e:a7:6c:84:52:53:af:ba:d7:26:7f:54:53:b2:2d:6b
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Root CA
        Validity
            Not Before: Jan  1 00:00:00 2001 GMT
            Not After : May 19 00:00:00 2028 GMT
        Subject: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Time Stamping CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:b6:bf:46:38:c7:c1:63:58:f1:95:c6:cf:0a:5d:
                    72:d1:11:ce:86:96:04:ce:8f:cb:ab:da:22:b9:e0:
[ ... ]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:0
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
            X509v3 Extended Key Usage:
                Time Stamping
            X509v3 Subject Key Identifier:
                3C:F5:43:45:3B:40:10:BC:3F:25:47:18:10:C4:19:18:83:8C:09:D0
            X509v3 Authority Key Identifier:
                keyid:98:9A:E3:EF:D8:C5:5C:7F:87:35:87:45:78:3D:51:8D:82:2F:1E:A3
                DirName:/C=GB/ST=Gallifrey/L=Gallifrey/O=Dr Who/CN=Dr Who Root CA
                serial:03:91:DC:F3:FA:8D:5A:CA:D0:3D:B7:EE:1B:71:2D:60:B5:0A:99:DE

    Signature Algorithm: sha256WithRSAEncryption
         13:18:16:99:6a:42:be:22:14:e5:e8:80:5a:ce:be:df:33:c6:
         22:df:d5:35:48:e6:9d:9f:ec:ec:07:72:49:33:ca:ca:3f:22:
[ ... ]

The public key contained in this certificate is pair with the secret key that signed the certificate before. As one would expect, this following fields match:

  • The list of assignments in Subject of this certificate is exactly the same as the Issuer in the previous one.
  • The Subject Key Identifier here with Authority Key Identifier, as keyid, in the previous one.
  • This Certificate’s Serial number appears in Authority Key Identifier as serial.
  • This Certificate’s Issuer appears in Authority Key Identifier in a condensed form as DirName.

Except for the Subject to Issuer match, the other fields may be missing in certificates. There’s a brief description of how the certificate chain is validated below, after showing the root certificate. At this point, these relations are listed just to help figuring out which certificate certifies which.

Note that unlike the previous certificate, CA is TRUE, which means that this a CA certificate (as opposed to End Entities certificate). In other words, it’s intended for the sole use of signing other certificates (and it does, at least the one above).

Also note that pathlen is assigned zero. This means that the it’s used only to sign End Entity certificates.

Note that DirName in Authority Key Identifier equals this certificate’s Issuer. Recall that DirName is the Issuer of the certificate that certifies this one. Hence the conclusion is that the certificate that certifies this one has the same name for Subject and Issuer: So with this subtle clue, we know almost for sure that the certificate above this one is a root certificate. Why almost? Because non-root self-issued certificates are allowed in the spec, but kindly show me one.

Extended Key Usage is set to Time Stamping. Even though this was supposed to be unusual, as mentioned before, this is what non-root certificates for time stamping and code signing usually look like.

And as expected, the Key Usage is Certificate Sign and CRL Sign, as one would expect to find on a CA certificate.

A sample root CA certificate

And now we’re left with the holy grail: the root certificate.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            03:91:dc:f3:fa:8d:5a:ca:d0:3d:b7:ee:1b:71:2d:60:b5:0a:99:de
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Root CA
        Validity
            Not Before: Jan  1 00:00:00 2001 GMT
            Not After : May 19 00:00:00 2028 GMT
        Subject: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Root CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:ce:e5:53:d7:1e:43:28:13:00:eb:b2:81:bb:ff:
                    28:23:98:9a:fd:69:07:ee:49:c5:54:44:66:77:5d:
[ ... ]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Subject Key Identifier:
                98:9A:E3:EF:D8:C5:5C:7F:87:35:87:45:78:3D:51:8D:82:2F:1E:A3
            X509v3 Authority Key Identifier:
                keyid:98:9A:E3:EF:D8:C5:5C:7F:87:35:87:45:78:3D:51:8D:82:2F:1E:A3
                DirName:/C=GB/ST=Gallifrey/L=Gallifrey/O=Dr Who/CN=Dr Who Root CA
                serial:03:91:DC:F3:FA:8D:5A:CA:D0:3D:B7:EE:1B:71:2D:60:B5:0A:99:DE

            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
    Signature Algorithm: sha256WithRSAEncryption
         30:92:7d:09:e4:ea:4d:81:dd:8e:c2:ba:c0:c4:a6:26:62:4d:
[ ... ]

All in all, it’s pretty similar to the previous non-root certificate, except that its Authority is itself: There is no way to validate this certificate, as there is no other public key to use. As mentioned above, root certificates are installed directly into the validating software’s database as the anchors of trust.

The list of fields that match between this and the previous certificate remains the same as between the previous certificate and its predecessor. Once again, not all fields are always present. Actually, there are a few fields that make no sense in a root certificate, and yet they are most commonly present. Let’s look at a couple of oddities (that are common):

  • The certificate is signed. One may wonder what for. The signature is always done with the Authority’s key, but in this case, its the key contained in the certificate itself. So this proves that the issuer of this certificate has the secret key that corresponds to the public key that is contained in the certificate. The need for a signature hence prevents issuing a root certificate for a public key without being the owner of the secret key. Why anyone would want to do that remains a question.
  • The certificate points at itself in the Authority Key Identifier extension. This is actually useful for spotting that this is indeed a root certificate, in particular when there are long and rambling names in the Subject / Issuer fields. But why the DirName?

How the certificate chain is validated

Chapter 6 of RFC 5280 offers a 20 pages long description of how a certificate chain is validated, and it’s no fun to read. However section 6.1.3 (“Basic Certificate Processing”) gives a concise outline of how the algorithm validates certificate Y based upon certificate X.

The algorithm given in the spec assumes that some other algorithm has found a candidate for a certificate chain. Chapter 6 describes how to check it by starting from root, and advancing one certificate pair at a time. This direction isn’t intuitive, as validation software usually encounters an End Entities certificate, and needs to figure out how to get to root from it. But as just said, the assumption is that we already know.

So the validation always trusts certificate X, and it checks if it can trust Y based upon the former. If so, it assigns X := Y and continues until it reaches the last certificate in the chain.

These four are the main checks that are made:

  1. The signature in certificate Y is validated with public key contained in certificate X.
  2. The Issuer part in certificate Y matches exactly the Subject of certificate X.
  3. Certificate Y’s validity period covers the time for which the chain is validated (the system clock time, or the timestamp’s time if such is applied).
  4. Certificate Y is not revoked.

Chapter 6 wouldn’t reach 20 pages if it was this simple, however much of the rambling in that chapter relates to certificate policies and other restrictions. The takeaway from this list of 4 criteria is where the focus is on walking from one certificate to another: The validation of the signature and matching the Subject / Issuer pairs.

I suppose that the Subject / Issuer check is there mostly to prevent certificates from being misleading to us humans: From a pure cryptographic point of view, no loopholes would have been created by skipping this test.

And this brings me back to what I started this post with: This whole thing with certificates has a bureaucratic side, and a cryptographic side. Both play a role.

This post is intentionally left blank

This post has been terminally removed. It’s pointless to ask me for a copy of it.

This post is intentionally left blank

This post has been terminally removed. It’s pointless to ask me for a copy of it.