Crypto jots on ASN.1 and Microsoft’s .cat files
Intro
Crypto is not my expertise. This is a pile of jots I wrote down as I tried to figure out what the Microsoft catalogue file is all about. Not-so-surprising spoiler: It appears to be organized and elegant at first glance, but the more you look into it, it’s a mess. Of the kind that’s caused by someone making a quick hack to solve that little problem very urgently. And repeat.
Sources
- My own introduction to certificates.
- C code for parsing ASN.1 (dumpasn1.c) can be found on this page. Be sure to download dumpasn1.cfg as well.
- A JavaScript ASN.1 parser from this site.
- For the PKCS#7 syntax, see this page.
- The osslsigncode utility (written in plain C) is not just useful for analyzing signed CATs, but is also boilerplate code for manipulations based upon openSSL’s API.
ASN.1 basics
ASN.1 is a protocol for organizing (usually small) pieces of data into a file in a structured and hierarchical manner. For each type of container (e.g. an x.509 certificate), there’s a protocol describing its format, written in syntax somewhat similar to C struct definitions. Exactly like C struct definitions, it may contain sub-structures. But don’t take this analogy too far, because ASN.1 definitions have optional fields, and fields with unknown number of members.
So if you want to follow what everything means, you need the definition for each element that you encounter. These are sometimes defined in the protocol for the relevant container type. Just like C structs, just looking at the data tells you what’s an integer and what’s a string, but their meaning depends on the order they appeared.
And just like a struct may contain other structs, there are objects in ASN.1. These are the catch-all method for inserting elements with arbitrary form.
When an object is encountered, it always has an object ID (OID), which defines its class. In that case, the format and meaning of what’s encapsulated is defined in the object’s definition. It may not be published (e.g. OIDs specific to Microsoft). Note that an OID defines the format of the object, but not necessarily its meaning. Even though OIDs that are used in very specific cases also tell us what they contain.
A few random notes that may help:
- The common binary format is DER. Each item is encoded with a one-byte identifier (e.g 0x30 for SEQUENCE) followed by the length of the item (including everything in lower hierarchies): Below 0x80 it’s the length given as a single byte, otherwise it’s given in Big Endian format. The first byte is the number of bytes to define the length + 0x80, and then it’s the length. After this comes the data.
- Because the length of the item is given explicitly, there’s a lot of freedom for types like INTEGER: It can be a single byte or huge numbers.
- The SEQUENCE item, as its name implies, is a sequence of elements which encapsulates some kind of information. Some fields are optional. In that case, there’s a number in square brackets, e.g. [0], [1], [2] etc. in the format specification. These number in square brackets appear in the parsed output as well, to indicate which optional field is displayed (if there are any).
- Object identifiers are given in dotted format, e.g. 1.2.840.113549.1.7.2 for signedData. In dumasn1′s output, they appear with spaces, e.g.
OBJECT IDENTIFIER signedData (1 2 840 113549 1 7 2)
The translation from numeric OID to a meaningful name is possible if dumpasn1 happens to have that OID listed, which is sometimes not the case. Either way, when encountering these, just Google them up for more information.
- There are two ASN.1-defined types for describing time: UTCTime (tag 0x17) and GeneralizedTime (tag 0x18). They can be used interchangeably. Curiously enough, both are given as ASCII strings. Looking for these is helpful for finding certificates in a large blob, as well as time stamps.
DER and PEM formats
In practice, there are two formats out there for crypto data: DER, which is the native binary format, and PEM, which is a base64 representation of a DER binary. The reason I consider DER to be “native” is that when a digital signature is made on a chunk of data, it’s the DER representation of the a chunk of ASN.1 segment that is hashed.
Openssl can be used to convert between the two formats. As openssl’s default format is PEM, use -inform DER and -outform DER as necessary to convince it into playing ball.
For example, converting a certificate from PEM to DER
$ openssl x509 -in mycert.crt -outform DER -out mycert.der
or in the opposite direction:
$ openssl x509 -inform DER -in mycert.der -out mycert2.crt
As for a .cat file (or any other file in DER PKCS#7 format), the same goes
$ openssl pkcs7 -inform DER -in thedriver.cat -out thedriver.pem
and back from PEM to DER:
$ openssl pkcs7 -in thedriver.pem -outform DER -out thedriver.cat
It’s somewhat daunting that there’s no catch-all converter from DER to PEM, so there’s a need to know which kind of crypto creature is converted. The identification is however necessary, as there are headers indicating the type of data in both DER and PER. It would just have been nicer to have this done automagically.
Inspecting ASN.1 files
The absolute winner for dumping blobs is dumpasn1.c. Download it from the link given above, compile it and install it in /usr/local/bin/ or something. Be sure to have dumpasn1.cfg in the same directory as the executable, so that OBJECT IDENTIFIER items (OIDs) get a human-readable string attached, and not just those magic numbers.
Then go
$ dumpasn1 -t thedriver.cat | less
Note that dumpasn1 expects DER format. See above if you have a PEM.
Flags:
- -t for seeing the text of strings
- -hh for long hex dumps
- -u for seeing timestamps as they appear originally
For those looking for instant gratification, there’s an online JavaScript parser, asn1js. In fact, the site allows downloading the HTML and JavaScript sources, and point the browser local files.
And then there’s openssl’s own dumper, which produces data that is less useful for human interaction. Its only real advantage is that it’s most likely already installed. Go something like this (or drop the -inform DER for parsing a PEM file):
$ openssl asn1parse -inform DER -i -in thedriver.cat -dump
Attempting to parse a DER file without the -inform DER flag, the result may be “Error: offset out of range”. It’s a misleading error message, so don’t fall for this one.
The certificates in a .cat file
For a general tutorial on certificates, see this post.
To extract (hopefully all) certificates included in a .cat (or any other PKCS#7) file, in cleartext combined with PEM format, go
$ openssl pkcs7 -inform DER -print_certs -text -in thedriver.cat -out the-certs.txt
A Windows .cat file is just a standard PKCS#7 file, which is a container for signed and/or encrypted data of any sort. The idea behind this format apparently is to say: First, some information to apply the signature on. Next, here are a bunch of certificates that will help to convince the validator that the public key that is used for the validation should be trusted. This part is optional, but it typically contains all certificates that are necessary for the certificate validation chain, except for the root certificate (which validation software mustn’t accept even if it’s present, or else the validation is pointless). And after the (optional) certificate section comes the signature on the content of the first part — the data to sign.
In some situations, signtool improvises a bit on where to put the certificates for validation, in particular those needed for validating the timestamp, and if a second signature is appended. This is contrary to the straightforward approach of putting all and any certificate in the dedicated PKCS#7 section, as discussed below. The question is whether one is surprised that Microsoft diverged from the standard or that it adopted a standard format to begin with.
The consequence of stashing away certificates in less expected places is that openssl utilities that the command above for extracting certificates from a .cat file may miss some of those. The only real way to tell is to look at an ASN.1 dump.
Finding the digital signature in a non-cat file
Signtool creates digital signatures in a similar way even for non-cat files. The quick way to find it is by looking for the following hex sequence (with e.g. “hexdump -C”):
30 82 xx xx 06 09 2a 86 48 86 f7 0d 01 07 02
The part marked in read is the part saying “OBJECT IDENTIFIER 1.2.840.113549.1.7.2″ which means a SignedData object. Even though this data structure is supposed to contain the data it signs, signtool often appends it to the data for signature. Non-standard, but hey, this is Microsoft.
Pay attention to the last bytes of this sequence rather than the first ones. There are other similar OIDs, but the difference is in the last bytes.
The reason I’ve added the four bytes before is that these are the beginning of the SEQUENCE, which the signature always begins with. The 0x82 part means that the two following bytes contain the length of the current chunk (in big Endian). For snipping out the signature, include these four bytes, to conform the PKCS#7 format.
I should also mention that there might be a SignedData object inside the outer SignedData object, due to signtool’s obscure way of including timstamps and/or multiple signatures. In principle, the outer one is the one to process, but it might also make sense to look at the inner object separately, in particular for extracting all certificates that are included.
To create a file that begins with the ASN.1 data, go something like this (if the 30 82 starter appeared in the hexdump at 0xc408):
$ dd if=thedriver.sys of=theblob.bin skip=$((0xc408)) bs=1
.cat file dissection notes
The root object has signedData OID, meaning that it follows the following format:
SignedData ::= SEQUENCE { version Version, digestAlgorithms DigestAlgorithmIdentifiers, contentInfo ContentInfo, certificates [0] CertificateSet OPTIONAL, crls [1] CertificateRevocationLists OPTIONAL, signerInfos SignerInfos }
I won’t go into the depth of each element. To make a long story short, there are three main elements:
- The contentInfo part, containing the data to be signed (a Microsoft catalogList item with file names, their hashes and more). If the CAT file isn’t signed (yet), this is the only part in the file. Note that catalogList contains the timestamp of the .cat file’s creation.
- The certificate part containing a list of certificates, which relate to the direct signature (as well as the timestamp on some versions of signtool). This is just a bunch of certificates that might become useful while evaluating the signature. As mentioned above and below, signtool sometimes puts them in signerInfos instead.
- The signerInfos part, containing a list of signatures on the data in the contentInfo part. But there’s always only one signature here. The timestamp is embedded into this signature. And even if a signature is “appended” with signtool’s /as flag, the additional signature isn’t added to this set, but obscurely shoved elsewhere. See below.
The signature is in the end, as a SignerInfos item.
SignerInfos ::= SET OF SignerInfo SignerInfo ::= SEQUENCE { version Version, signerIdentifier SignerIdentifier, digestAlgorithm DigestAlgorithmIdentifier, authenticatedAttributes [0] Attributes OPTIONAL, digestEncryptionAlgorithm DigestEncryptionAlgorithmIdentifier, encryptedDigest EncryptedDigest, unauthenticatedAttributes [1] Attributes OPTIONAL }
It’s easy to spot it in the dump as something like
8805 931: SET { 8809 927: SEQUENCE { 8813 1: INTEGER 1 8816 135: SEQUENCE {
towards the end.
Curiously enough, signerIdentifier is defined as
SignerIdentifier ::= CHOICE { issuerAndSerialNumber IssuerAndSerialNumber, subjectKeyIdentifier [2] SubjectKeyIdentifier }
and what it typically found is issuerAndSerialNumber. In other words, the details of the certificate which confirms the public key (i.e. its serial number) appear in this section, and not those of the signer. The only part that relates to the signer is the serial number.
So in essence, the textual parts in SignerIdentifier should essentially be ignored. To start the certificate chain, begin from the serial number and climb upwards.
The timestamp appears as unauthenticatedAttributes, and is identified as a countersignature (1.2.840.113549.1.9.6) or Ms-CounterSign (1.3.6.1.4.1.311.3.3.1):
9353 383: [1] { 9357 379: SEQUENCE { 9361 9: OBJECT IDENTIFIER : countersignature (1 2 840 113549 1 9 6)
Just like the signature, it’s given in issuerAndSerialNumber form, so the textual info belongs to the issuer of the certificate. The only informative part is the serial number.
Notes:
- Public keys are typically identified by their serial numbers. This is the part that connects between the key and related certificates.
- Serial numbers appear just as “INTEGER” in the dump, but it’s easy to spot them as strings of hex numbers.
- In the ASN.1 data structure, certificates convey the information on the issuer first and then subject. It’s somewhat counterintuitve.
Looking at the data part of a .cat file
The truth is that dissecting a .cat file’s ASN.1 blob doesn’t reveal more than is visible from the utility that pops up when one clicks the .cat file in Windows. It’s just a list of items, one part consists of the files protected by the catalogue, and the second some additional information (which is superfluous).
For a Windows device driver, the files covered are the .sys and .inf files. In Window’s utility for inspecting .cat files, these files appear under the “Security Catalog” tab. Each file is represented with a “Tag” entry, which is (typically? Always?) the SHA1 sum of the file. Clicking on it reveals the attributes as they appear in the .cat file, among others the thumbprint algorithm (sha1) and the value (which coincides with the “Tag”, just with spaces). Even more interestingly, the File attribute is the file name without the path to it.
In other words, the catalogue doesn’t seem to protect the position of the files in the file hierarchy. The strategy for validating a file seems therefore to be to calculate its SHA1 sum, and look it up in the catalogue. If there’s a match, make sure that the file name matches. But there’s apparently no problem moving around the file in the file hierarchy.
Under the “General” tab of the same utility, there are hex dumps of the DER-formatted data each object, with the OID (all 1.3.6.1.4.1.311.12.2.1) given in the “Field” column for each. The information here is the operating system and the Plug & Play IDs (Vendor & Product IDs) that the driver covers. Which is redundant, since this information is written in the .inf file, which is protected anyhow. That may explain why the presentation of this info in the utility is done so horribly bad.
Multiple signatures
When an additional signature has been added by virtue of signtool’s /as flag (“append signature”), it’s added as an OID_NESTED_SIGNATURE (1.3.6.1.4.1.311.2.4.1) item in the unauthenticatedAttributes, after the timestamp signature of the original signature:
9740 8031: SEQUENCE { 9744 10: OBJECT IDENTIFIER '1 3 6 1 4 1 311 2 4 1' 9756 8015: SET { 9760 8011: SEQUENCE { 9764 9: OBJECT IDENTIFIER : signedData (1 2 840 113549 1 7 2)
Under this there’s a signedData item (i.e. the same OID as the one encapsulating the entire file), containing no data by itself, but does contain a bunch of certificates, a signature (on what?) and a timestamp, apparently with some Microsoft improvisations on the standard format.
So they basically said, hey, let’s just push another PKCS#7 blob, from beginning to end, minus the CAT data itself, in that place where anyone can do whatever he wants. The correct way would of course have been to add another SignerInfo item to the SignerInfos set, but hey, this is Microsoft.
The takeaway is that this is likely to cause problems, as hacks always do. And not just for us who want to analyze what’s in there. My response to this is to publish two separate driver files if needed, and stay away from these double signatures.
Checking Windows signatures in Linux
For this there’s opensslsigncode. I cloned a copy, and compiled at commit ID c0d9569c4f6768d9561978422befa4e44c5dfd34. It was basically:
$ ./autogen.sh $ ./configure $ make
It seemed to complain about curl not being installed, but it was this that was actually needed:
# apt install libcurl4-openssl-dev
Copied osslsigncode to /usr/local/bin/, and then I could check a Windows driver catalog file with
$ osslsigncode verify -in thedriver.cat
The important thing is that it prints out a neat summary of the certificates in the file. Less informative than using openssl to extract the certificates as shown above, and more descriptive than openssl’s output. However the version I tried crashed when faces with a driver with double signatures. Not sure who to blame.