Crypto jots on ASN.1 and Microsoft’s .cat files

Intro

Crypto is not my expertise. This is a pile of jots I wrote down as I tried to figure out what the Microsoft catalogue file is all about. Not-so-surprising spoiler: It appears to be organized and elegant at first glance, but the more you look into it, it’s a mess. Of the kind that’s caused by someone making a quick hack to solve that little problem very urgently. And repeat.

Sources

  • My own introduction to certificates.
  • C code for parsing ASN.1 (dumpasn1.c) can be found on this page. Be sure to download dumpasn1.cfg as well.
  • A JavaScript ASN.1 parser from this site.
  • For the PKCS#7 syntax, see this page.
  • The osslsigncode utility (written in plain C) is not just useful for analyzing signed CATs, but is also boilerplate code for manipulations based upon openSSL’s API.

ASN.1 basics

ASN.1 is a protocol for organizing (usually small) pieces of data into a file in a structured and hierarchical manner. For each type of container (e.g. an x.509 certificate), there’s a protocol describing its format, written in syntax somewhat similar to C struct definitions. Exactly like C struct definitions, it may contain sub-structures. But don’t take this analogy too far, because ASN.1 definitions have optional fields, and fields with unknown number of members.

So if you want to follow what everything means, you need the definition for each element that you encounter. These are sometimes defined in the protocol for the relevant container type. Just like C structs, just looking at the data tells you what’s an integer and what’s a string, but their meaning depends on the order they appeared.

And just like a struct may contain other structs, there are objects in ASN.1. These are the catch-all method for inserting elements with arbitrary form.

When an object is encountered, it always has an object ID (OID), which defines its class. In that case, the format and meaning of what’s encapsulated is defined in the object’s definition. It may not be published (e.g. OIDs specific to Microsoft). Note that an OID defines the format of the object, but not necessarily its meaning. Even though OIDs that are used in very specific cases also tell us what they contain.

A few random notes that may help:

  • The common binary format is DER. Each item is encoded with a one-byte identifier (e.g 0x30 for SEQUENCE) followed by the length of the item (including everything in lower hierarchies): Below 0x80 it’s the length given as a single byte, otherwise it’s given in Big Endian format. The first byte is the number of bytes to define the length + 0x80, and then it’s the length. After this comes the data.
  • Because the length of the item is given explicitly, there’s a lot of freedom for types like INTEGER: It can be a single byte or huge numbers.
  • The SEQUENCE item, as its name implies, is a sequence of elements which encapsulates some kind of information. Some fields are optional. In that case, there’s a number in square brackets, e.g. [0], [1], [2] etc. in the format specification. These number in square brackets appear in the parsed output as well, to indicate which optional field is displayed (if there are any).
  • Object identifiers are given in dotted format, e.g. 1.2.840.113549.1.7.2 for signedData. In dumasn1′s output, they appear with spaces, e.g.
    OBJECT IDENTIFIER signedData (1 2 840 113549 1 7 2)

    The translation from numeric OID to a meaningful name is possible if dumpasn1 happens to have that OID listed, which is sometimes not the case. Either way, when encountering these, just Google them up for more information.

  • There are two ASN.1-defined types for describing time: UTCTime (tag 0x17) and GeneralizedTime (tag 0x18). They can be used interchangeably. Curiously enough, both are given as ASCII strings. Looking for these is helpful for finding certificates in a large blob, as well as time stamps.

DER and PEM formats

In practice, there are two formats out there for crypto data: DER, which is the native binary format, and PEM, which is a base64 representation of a DER binary. The reason I consider DER to be “native” is that when a digital signature is made on a chunk of data, it’s the DER representation of the a chunk of ASN.1 segment that is hashed.

Openssl can be used to convert between the two formats. As openssl’s default format is PEM, use -inform DER and -outform DER as necessary to convince it into playing ball.

For example, converting a certificate from PEM to DER

$ openssl x509 -in mycert.crt -outform DER -out mycert.der

or in the opposite direction:

$ openssl x509 -inform DER -in mycert.der -out mycert2.crt

As for a .cat file (or any other file in DER PKCS#7 format), the same goes

$ openssl pkcs7 -inform DER -in thedriver.cat -out thedriver.pem

and back from PEM to DER:

$ openssl pkcs7 -in thedriver.pem -outform DER -out thedriver.cat

It’s somewhat daunting that there’s no catch-all converter from DER to PEM, so there’s a need to know which kind of crypto creature is converted. The identification is however necessary, as there are headers indicating the type of data in both DER and PER. It would just have been nicer to have this done automagically.

Inspecting ASN.1 files

The absolute winner for dumping blobs is dumpasn1.c. Download it from the link given above, compile it and install it in /usr/local/bin/ or something. Be sure to have dumpasn1.cfg in the same directory as the executable, so that OBJECT IDENTIFIER items (OIDs) get a human-readable string attached, and not just those magic numbers.

Then go

$ dumpasn1 -t thedriver.cat | less

Note that dumpasn1 expects DER format. See above if you have a PEM.

Flags:

  • -t for seeing the text of strings
  • -hh for long hex dumps
  • -u for seeing timestamps as they appear originally

For those looking for instant gratification, there’s an online JavaScript parser, asn1js. In fact, the site allows downloading the HTML and JavaScript sources, and point the browser local files.

And then there’s openssl’s own dumper, which produces data that is less useful for human interaction. Its only real advantage is that it’s most likely already installed. Go something like this (or drop the -inform DER for parsing a PEM file):

$ openssl asn1parse -inform DER -i -in thedriver.cat -dump

Attempting to parse a DER file without the -inform DER flag, the result may be “Error: offset out of range”. It’s a misleading error message, so don’t fall for this one.

The certificates in a .cat file

For a general tutorial on certificates, see this post.

To extract (hopefully all) certificates included in a .cat (or any other PKCS#7) file, in cleartext combined with PEM format, go

$ openssl pkcs7 -inform DER -print_certs -text -in thedriver.cat -out the-certs.txt

A Windows .cat file is just a standard PKCS#7 file, which is a container for signed and/or encrypted data of any sort. The idea behind this format apparently is to say: First, some information to apply the signature on. Next, here are a bunch of certificates that will help to convince the validator that the public key that is used for the validation should be trusted. This part is optional, but it typically contains all certificates that are necessary for the certificate validation chain, except for the root certificate (which validation software mustn’t accept even if it’s present, or else the validation is pointless). And after the (optional) certificate section comes the signature on the content of the first part — the data to sign.

In some situations, signtool improvises a bit on where to put the certificates for validation, in particular those needed for validating the timestamp, and if a second signature is appended. This is contrary to the straightforward approach of putting all and any certificate in the dedicated PKCS#7 section, as discussed below. The question is whether one is surprised that Microsoft diverged from the standard or that it adopted a standard format to begin with.

The consequence of stashing away certificates in less expected places is that openssl utilities that the command above for extracting certificates from a .cat file may miss some of those. The only real way to tell is to look at an ASN.1 dump.

Finding the digital signature in a non-cat file

Signtool creates digital signatures in a similar way even for non-cat files. The quick way to find it is by looking for the following hex sequence (with e.g. “hexdump -C”):

30 82 xx xx 06 09 2a 86 48 86 f7 0d 01 07 02

The part marked in read is the part saying “OBJECT IDENTIFIER 1.2.840.113549.1.7.2″ which means a SignedData object. Even though this data structure is supposed to contain the data it signs, signtool often appends it to the data for signature. Non-standard, but hey, this is Microsoft.

Pay attention to the last bytes of this sequence rather than the first ones. There are other similar OIDs, but the difference is in the last bytes.

The reason I’ve added the four bytes before is that these are the beginning of the SEQUENCE, which the signature always begins with. The 0x82 part means that the two following bytes contain the length of the current chunk (in big Endian). For snipping out the signature, include these four bytes, to conform the PKCS#7 format.

I should also mention that there might be a SignedData object inside the outer SignedData object, due to signtool’s obscure way of including timstamps and/or multiple signatures. In principle, the outer one is the one to process, but it might also make sense to look at the inner object separately, in particular for extracting all certificates that are included.

To create a file that begins with the ASN.1 data, go something like this (if the 30 82 starter appeared in the hexdump at 0xc408):

$ dd if=thedriver.sys of=theblob.bin skip=$((0xc408)) bs=1

.cat file dissection notes

The root object has signedData OID, meaning that it follows the following format:

SignedData ::= SEQUENCE {
  version           Version,
  digestAlgorithms  DigestAlgorithmIdentifiers,
  contentInfo       ContentInfo,
  certificates      [0]  CertificateSet OPTIONAL,
  crls              [1]  CertificateRevocationLists OPTIONAL,
  signerInfos       SignerInfos
}

I won’t go into the depth of each element. To make a long story short, there are three main elements:

  • The contentInfo part, containing the data to be signed (a Microsoft catalogList item with file names, their hashes and more). If the CAT file isn’t signed (yet), this is the only part in the file. Note that catalogList contains the timestamp of the .cat file’s creation.
  • The certificate part containing a list of certificates, which relate to the direct signature (as well as the timestamp on some versions of signtool). This is just a bunch of certificates that might become useful while evaluating the signature. As mentioned above and below, signtool sometimes puts them in signerInfos instead.
  • The signerInfos part, containing a list of signatures on the data in the contentInfo part. But there’s always only one signature here. The timestamp is embedded into this signature. And even if a signature is “appended” with signtool’s /as flag, the additional signature isn’t added to this set, but obscurely shoved elsewhere. See below.

The signature is in the end, as a SignerInfos item.

SignerInfos ::= SET OF SignerInfo

SignerInfo ::= SEQUENCE {
  version                    Version,
  signerIdentifier           SignerIdentifier,
  digestAlgorithm            DigestAlgorithmIdentifier,
  authenticatedAttributes    [0]  Attributes OPTIONAL,
  digestEncryptionAlgorithm  DigestEncryptionAlgorithmIdentifier,
  encryptedDigest            EncryptedDigest,
  unauthenticatedAttributes  [1]  Attributes OPTIONAL
}

It’s easy to spot it in the dump as something like

8805  931:       SET {
8809  927:         SEQUENCE {
8813    1:           INTEGER 1
8816  135:           SEQUENCE {

towards the end.

Curiously enough, signerIdentifier is defined as

SignerIdentifier ::= CHOICE {
  issuerAndSerialNumber  IssuerAndSerialNumber,
  subjectKeyIdentifier   [2]  SubjectKeyIdentifier
}

and what it typically found is issuerAndSerialNumber. In other words, the details of the certificate which confirms the public key (i.e. its serial number) appear in this section, and not those of the signer. The only part that relates to the signer is the serial number.

So in essence, the textual parts in SignerIdentifier should essentially be ignored. To start the certificate chain, begin from the serial number and climb upwards.

The timestamp appears as unauthenticatedAttributes, and is identified as a countersignature (1.2.840.113549.1.9.6) or Ms-CounterSign (1.3.6.1.4.1.311.3.3.1):

9353  383:           [1] {
9357  379:             SEQUENCE {
9361    9:               OBJECT IDENTIFIER
         :                 countersignature (1 2 840 113549 1 9 6)

Just like the signature, it’s given in issuerAndSerialNumber form, so the textual info belongs to the issuer of the certificate. The only informative part is the serial number.

Notes:

  • Public keys are typically identified by their serial numbers. This is the part that connects between the key and related certificates.
  • Serial numbers appear just as “INTEGER” in the dump, but it’s easy to spot them as strings of hex numbers.
  • In the ASN.1 data structure, certificates convey the information on the issuer first and then subject. It’s somewhat counterintuitve.

Looking at the data part of a .cat file

The truth is that dissecting a .cat file’s ASN.1 blob doesn’t reveal more than is visible from the utility that pops up when one clicks the .cat file in Windows. It’s just a list of items, one part consists of the files protected by the catalogue, and the second some additional information (which is superfluous).

For a Windows device driver, the files covered are the .sys and .inf files. In Window’s utility for inspecting .cat files, these files appear under the “Security Catalog” tab. Each file is represented with a “Tag” entry, which is (typically? Always?) the SHA1 sum of the file. Clicking on it reveals the attributes as they appear in the .cat file, among others the thumbprint algorithm (sha1) and the value (which coincides with the “Tag”, just with spaces). Even more interestingly, the File attribute is the file name without the path to it.

In other words, the catalogue doesn’t seem to protect the position of the files in the file hierarchy. The strategy for validating a file seems therefore to be to calculate its SHA1 sum, and look it up in the catalogue. If there’s a match, make sure that the file name matches. But there’s apparently no problem moving around the file in the file hierarchy.

Under the “General” tab of the same utility, there are hex dumps of the DER-formatted data each object, with the OID (all 1.3.6.1.4.1.311.12.2.1) given in the “Field” column for each. The information here is the operating system and the Plug & Play IDs (Vendor & Product IDs) that the driver covers. Which is redundant, since this information is written in the .inf file, which is protected anyhow. That may explain why the presentation of this info in the utility is done so horribly bad.

Multiple signatures

When an additional signature has been added by virtue of signtool’s /as flag (“append signature”), it’s added as an OID_NESTED_SIGNATURE (1.3.6.1.4.1.311.2.4.1) item in the unauthenticatedAttributes, after the timestamp signature of the original signature:

 9740  8031:             SEQUENCE {
 9744    10:               OBJECT IDENTIFIER '1 3 6 1 4 1 311 2 4 1'
 9756  8015:               SET {
 9760  8011:                 SEQUENCE {
 9764     9:                   OBJECT IDENTIFIER
           :                     signedData (1 2 840 113549 1 7 2)

Under this there’s a signedData item (i.e. the same OID as the one encapsulating the entire file), containing no data by itself, but does contain a bunch of certificates, a signature (on what?) and a timestamp, apparently with some Microsoft improvisations on the standard format.

So they basically said, hey, let’s just push another PKCS#7 blob, from beginning to end, minus the CAT data itself, in that place where anyone can do whatever he wants. The correct way would of course have been to add another SignerInfo item to the SignerInfos set, but hey, this is Microsoft.

The takeaway is that this is likely to cause problems, as hacks always do. And not just for us who want to analyze what’s in there. My response to this is to publish two separate driver files if needed, and stay away from these double signatures.

Checking Windows signatures in Linux

For this there’s opensslsigncode. I cloned a copy, and compiled at commit ID c0d9569c4f6768d9561978422befa4e44c5dfd34. It was basically:

$ ./autogen.sh
$ ./configure
$ make

It seemed to complain about curl not being installed, but it was this that was actually needed:

# apt install libcurl4-openssl-dev

Copied osslsigncode to /usr/local/bin/, and then I could check a Windows driver catalog file with

$ osslsigncode verify -in thedriver.cat

The important thing is that it prints out a neat summary of the certificates in the file. Less informative than using openssl to extract the certificates as shown above, and more descriptive than openssl’s output. However the version I tried crashed when faces with a driver with double signatures. Not sure who to blame.

A sledge hammer introduction to X.509 certificates

Introduction

First and foremost: Crypto is not my expertise. This is a note to future self for the next time I’ll need to deal with similar topics. This post summarizes my understanding as I prepared worked on a timestamp server, and it shows the certificates used by it.

For how to check a connection with an https host (and similar) with openssl, see this other post of mine.

There are many guides to X.509 certificates out there, however it seems like it’s common practice to focus on the bureaucratic aspects (a.k.a. Public Key Infrastructure, or PKI), and less on the real heroes of this story: The public cryptographic keys that are being certified.

For example, RFC 3647 starts with:

In general, a public-key certificate (hereinafter “certificate”) binds a public key held by an entity (such as person, organization, account, device, or site) to a set of information that identifies the entity associated with use of the corresponding private key. In most cases involving identity certificates, this entity is known as the “subject” or “subscriber” of the certificate.

Which is surely correct, and yet it dives right into organization structures etc. Not complaining, the specific RFC is just about that.

So this post is an attempt to make friends with these small chunks of data, with a down-to-earth, technical approach. I’m not trying to cover all aspects nor being completely accurate. For exact information, refer to RFC 5280. When I say “the spec” below, I mean this document.

Let’s start from the basics, with the main character of this story: The digital signature.

The common way to make a digital signature is to first produce a pair of cryptographic keys: One is secret, and the second is public. Both are just short computer files.

The secret key is used in the mathematical operation that constitutes the action of a digital signature. Having access to it is therefore equivalent to being the person or entity that it represents. The public key allows verifying the digital signature with a similar mathematical operation.

A certificate is a message (practically — a computer file), saying “here’s a public key, and I hereby certify that it’s valid for use between this and this time for these and these uses”. This message is then digitally signed by whoever gives the certification (with is a key different from the one certified, of course). As we shall see below, there’s a lot more information in a certificate, but this is the point of it all.

The purpose of a certificate is like ID cards in real life: It’s a document that allows us to trust a piece of information from someone we’ve never seen before and know nothing about, without the possibility to consult with a third party. So there must be something about this document that makes it trustworthy.

The certificate chain

Every piece of software that works with public keys is installed with a list of public keys that it trusts. Browsers carry a relatively massive list for SSL certificates, but for kernel code signing it consists of exactly one certificate. So the size of this list varies, but is surely very small compared with the number of certificates out there are in general. Keys may be added and removed to this list in the course of time, but its size remains roughly the same.

The common way to maintain this list is by virtue of root certificates: These certificates basically say “trust me, this key is OK”. I’ll get further into this along with the example of a root certificate below.

As the secret keys of these root certificates are precious, they can’t be used to sign every certificate in the world. Instead, these are used to approve the key of another certificate. And quite often, that second certificate approves the key for verifying a third certificate. Only that certificate approves the public key which the software needs to know if it’s OK for use. In this example, these three certificates form a certificate chain. In real life, this chain usually consists of 3-5 certificates.

In many practical applications (e.g. code signing and time stamping) the sender of the data for validation also attaches a few certificates in order to help the validating side. Likewise, when a browser establishes a secure connection, it typically receives more than one certificate.

None of these peer-supplied certificates are root certificates (and if there is one, any sane software will ignore it, or else is the validation worthless). The validating software then attempts to create a valid certificate chain going from its own pool of root certificates (and possibly some other certificates it has access to) to the public key that needs validation. If such is found, the validation is deemed successful.

The design of the certificate system envisions two kinds of keys: Those used by End Entities for doing something useful, and those used by Certificate Authorities only for the purpose of signing and verifying other certificates. Each certificate testifies which type it belongs to in the “Basic Constraints” extension, as explained below.

Put shortly: The certificates that we (may) pay a company to make for us, are all End Entities certificates.

In this post I’ll show a valid certificate chain consisting of three certificates.

A sample End Entity certificate

This is a textual dump of a certificate, obtained with something like:

openssl x509 -in thecertificate.crt -text

Other tools represent the information slightly differently, but the terminology tends to remain the same.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            65:46:72:11:63:f1:85:b4:3d:95:3d:72:66:e6:ee:c5:1c:f6:2b:6e
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Time Stamping CA
        Validity
            Not Before: Jan  1 00:00:00 2001 GMT
            Not After : May 19 00:00:00 2028 GMT
        Subject: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Time Stamping Service CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:d7:34:07:c5:dd:f5:e6:6a:b2:9e:e6:76:e3:ce:
                    af:33:a3:10:60:97:e8:27:f1:62:87:90:a9:21:52:
[ ... ]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage: critical
                Time Stamping
            X509v3 Subject Key Identifier:
                3A:E5:43:A1:40:3F:A4:0F:01:CE:D3:3F:2A:EE:4E:92:B9:28:5C:3A
            X509v3 Authority Key Identifier:
                keyid:3C:F5:43:45:3B:40:10:BC:3F:25:47:18:10:C4:19:18:83:8C:09:D0
                DirName:/C=GB/ST=Gallifrey/L=Gallifrey/O=Dr Who/CN=Dr Who Root CA
                serial:7A:CF:23:8D:2E:A7:6C:84:52:53:AF:BA:D7:26:7F:54:53:B2:2D:6B

    Signature Algorithm: sha256WithRSAEncryption
         6c:54:88:55:ff:c7:e1:81:73:4e:00:80:46:0d:dc:d9:32:c1:
         53:ba:ff:f9:32:e4:f3:83:c2:29:bb:e5:91:88:8e:6f:46:f4:
[ ... ]

The key owning the certificate

Much of the difficulty to understand certificates stems from the fact that the bureaucratic terminology is misleading, making it look as if it was persons or companies that are certified.

So let’s keep the eyes on the ball: This is all about the cryptographic keys. There’s the key which is included in the certificate (the Subject’s public key) and there’s the key that signs the certificate (the Authority’s key). There are of course someones owning these keys, and their information is the one that is presented to us humans first and foremost. And yet, it’s all about the keys.

The certificate’s purpose is to say something about the cryptographic key which is given explicitly in the certificate’s body (printed out in hex format as “Modulus” above). On top of that, there’s always the “Subject” part, which the human-readable name given to this key.

As seen in the printout above, the Subject is a set of attributes and values assignments. Collectively, they are the name of the key. Which attributes are assigned differs from certificate to certificate, and it may even contain no attributes at all. The meaning of and rules for setting these has to do with the bureaucracy of assigning real-life certificates. From a technical point of view, these are just string assignments.

Usually, the most interesting one is CN, which stands for commonName, and is the most descriptive part in the Subject. And yet, it may be confusingly similar to that of other certificates.

For certificates which certify an SSL key for use by web server, the Subject’s CN is the domain it covers (possibly with a “*” wildcard). It might be the only assignment. For example *.facebook.com or mysmallersite.com.

Except for some root certificates, there’s a X509v3 Subject Key Identifier entry in the certificate as well. It’s a short hex string which is typically the SHA1 hash of the public key, or part of it, but the spec allows using other algorithms. It’s extremely useful for identifying certificates, since it’s easy to get confused between Subject names.

I’ll discuss root authorities and root certificates below, along with looking at a root certificate.

The key that signed the certificate

Then we have the “Authority” side, which is the collective name for whoever signed the certificate. Often called Certificate Authority, or CA for short. Confusingly enough (for me), it appears before the Subject, in the binary blob of the certificate as well as the text output above.

The bureaucratic name of this Authority is given as the”Issuer“. Once again, it consists of a set of attributes and values assignments, which collectively are the name of the key that is used to sign the certificate. This tells us to look for a certificate with an exact match: The exact same set of assignments, with the exact same values. If such issuer certificate is found, and we trust it, and it’s allowed to sign certificates, and the Issuer’s public key validates the signature of the Subject’s certificate — plus a bunch of other conditions — then the Subject certificate is considered valid. In other words, the public key it contains is valid for the uses mentioned in it. This said with lots of fine details omitted.

But looking for a certificate in the database based upon the name is inefficient, as the same entity may have multiple keys and hence multiple certificates for various reasons — in particular because a certificate is time limited. To solve this, all certificates (except root certificates) must point at their Authority with the X509v3 Authority Key Identifier field (though I’ve seen certificates without it). There are two methods for this:

  1. The value of that appears in the Subject Key Identifier field, in the certificate for the key that signed the current certificate (so it’s basically a hash of the public key that signed this certificate).
  2. The serial number of the certificate of the key that signed the current certificate, plus the Issuer name of the Authority’s certificate — that is the Authority that is two levels up in the foodchain. This is a more heavy-weight identification, and gives us a hint on what’s going on higher up.

The first method is more common (and is required if you want to call yourself a CA), and sometimes both are present.

Anyhow, practically speaking, when I want to figure out which certificate approves which, I go by the Subject / Authority Key Identifiers. It’s much easier to keep track of the first couple of hex octets than those typically long and confusing names.

Validity times and Certificate serial number

These are quite obvious: The validity time limits the time period for which the certificate can be used. The validating software uses the computer’s clock for this purpose, unless the validated message is timestamped (in particular with code signing), in which case the timestamp is used to validate all certificates in the chain.

The serial number is just a number that is unique for each certificate. The spec doesn’t define any specific algorithm for generating it. Note that this number relates to the certificate itself, and not to the public key being certified.

The signature

All certificates are signed with the secret key of their Authority. The public key for verifying it is given in the Authority’s certificate.

The signature algorithm appears at the beginning of the certificate, however the signature itself is last. The spec requires, obviously, that the algorithm that is used in the signature is the one stated in the beginning.

The signature is made on the ASN.1 DER-encoded blob which contains all certificate information (except for the signature section itself, of course).

X509v3 extensions

Practically all certificates that are used today are version 3 certificates, and they all have a section called X509v3 extensions. In this section, the creator of the certificate insert data objects as desired (but with some minimal requirements, as defined in the spec). The meaning and structure of each data object is conveyed by an Object Identifier (OID) field at the header of each object, appearing before the data in the certificate’s ASN.1 DER blob. It’s therefore possible to push any kind of data in this section, by assigning an OID for that kind of data.

In addition to the OID, each such data object also has a boolean value called “critical”: Note that some of the extensions in the example above are marked as critical, and some are not. When an extension is critical (the boolean is set true) the certificate must be deemed invalid if the extension is not recognized by its verifying software. Extensions that limit the usage of a certificate are typically marked critical, so that unintended use doesn’t occur because the extension wasn’t recognized.

I’ve already mentioned two x509v3 extensions: X509v3 Subject Key Identifier and X509v3 Authority Key Identifier, none of which are critical in the example above. And it makes sense: If the verifying software doesn’t recognize these, it has other means to figure out which certificate goes where.

So coming up next is a closer look at a few standard X509v3 extensions.

X509v3 Key Usage

As its name implies, this extension defines the allowed uses of the key contained in the certificate. A certificate that is issued by a CA must have this extension present, and mark it Critical.

This extension contains a bit string of 8 bits, defining the allowed usages as follows:

  • Bit 0: digitalSignature — verify a digital signature other than the one of a certificate or CRL (these are covered with bits 5 and 6).
  • Bit 1: nonRepudiation (or contentCommitment) — verify a digital signature in a way that is legally binding. In other words, a signature made with this key can’t be claimed later to be false.
  • Bit 2: keyEncipherment — encipher a private or secret key with the public key contained in the certificate.
  • Bit 3: dataEncipherment — encipher payload data directly with the public key (rarely used).
  • Bit 4: keyAgreement — for use with Diffie-Hellman or similar key exchange methods.
  • Bit 5: keyCertSign — verify the digital signature of certificates.
  • Bit 6: cRLSign — verify the digital signature of CRLs.
  • Bit 7: encipherOnly — when this and keyAgreement bits are set, only enciphering data is allowed in the key exchange process.
  • Bit 8: decipherOnly — when this and keyAgreement bits are set, only deciphering data is allowed in the key exchange process.

X509v3 Extended Key Usage

The Key Usage extension is somewhat vague about the purposes of the cryptographic operations. In particular, when the public key can be used to verify digital signature, surely not all kinds of signatures? If this was the case, this would make the public key valid to sign anything (that isn’t a legal document, a certificate and a CRL, and still).

On the other hand, how can a protocol properly foresee any possible use of the public key? Well, it can’t. Instead, each practical use of the key is given a unique number in the vocabulary of Object Identifiers (OIDs). This extension merely lists the OIDs that are relevant, and this translates into allowed uses. When evaluating the eligibility to use the public key (that is contained in the certificate), the Key Usage and Extended Key Usage are evaluated separately; a green light is given only if both evaluations resulted in an approval.

The spec doesn’t require this extension to be marked Critical, but it usually is, or what’s the point. The spec does however say that “in general, this extension will appear only in end entity certificates”, i.e. a certificate that is given to the end user (and hence with a key that can’t be used to sign other certificates). In reality, this extension is often present and assigned the intended use in certificates in the middle of the chain, despite this suggestion. As I’ve seen this in code signing and time stamping middle-chain certificates, maybe it’s to restrict the usage of this middle certificate for certain purposes. Or maybe it’s a workaround for buggy validation software.

This is a short and incomplete list of interesting OIDs that may appear in this extension:

  • TLS Web Server Authentication: 1.3.6.1.5.5.7.3.1
  • TLS Web Client Authentication: 1.3.6.1.5.5.7.3.2
  • Code signing: 1.3.6.1.5.5.7.3.3
  • Time stamping: 1.3.6.1.5.5.7.3.8

The first two appear in certificates that are issued for HTTPS servers.

X509v3 Basic Constraints

Never mind this extension’s name. It has nothing to do with what it means.

This extension involves two elements: First, a boolean value, “cA”, meaning Certificate Authority. According to the spec, the meaning of this flag is that the Subject of the certificate is a Certificate Authority (as opposed to End Entity). When true, the key included in the certificate may be used to sign other certificates.

But wait, what about the keyCertSign capability in X509v3 Key Usage (i.e. bit 6)? Why the duplicity? Not clear, but the spec requires that if cA is false, then keyCertSign must be cleared (certification signature not allowed). In other words, if you’re not a CA, don’t create certificates that can sign other certificates.

This flag is actually useful for manually analyzing a certificate chain going from the end user certificate towards the root: Given a pile of certificates, look for the one with CA:FALSE. That’s the certificate to start with.

The second element is pathLenConstraint, which usually appears as pathlen in text dumps. It limits the number of certificates between the current one and the final certificate in the chain, which is typically the End Entity certificate. Commonly, pathlen is set to zero in the certificate that authorizes the certificate that someone paid to get.

If there’s a self-issued certificate (Subject is identical to Issuer) in the chain which isn’t the root certificate, forget what I said about pathlen. But first find me such a certificate.

This extension is allowed to be marked critical or not.

X509v3 Subject Alternative Name

(not in the example above)

This extension allows assigning additional names, on top of the one appearing in Subject (or possibly instead of it). It’s often used with SSL certificates in order to make it valid for multiple domains.

A sample non-root CA certificate

This is the text dump of the certificate which is the Authority of the example certificate listed above:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            7a:cf:23:8d:2e:a7:6c:84:52:53:af:ba:d7:26:7f:54:53:b2:2d:6b
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Root CA
        Validity
            Not Before: Jan  1 00:00:00 2001 GMT
            Not After : May 19 00:00:00 2028 GMT
        Subject: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Time Stamping CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:b6:bf:46:38:c7:c1:63:58:f1:95:c6:cf:0a:5d:
                    72:d1:11:ce:86:96:04:ce:8f:cb:ab:da:22:b9:e0:
[ ... ]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:0
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
            X509v3 Extended Key Usage:
                Time Stamping
            X509v3 Subject Key Identifier:
                3C:F5:43:45:3B:40:10:BC:3F:25:47:18:10:C4:19:18:83:8C:09:D0
            X509v3 Authority Key Identifier:
                keyid:98:9A:E3:EF:D8:C5:5C:7F:87:35:87:45:78:3D:51:8D:82:2F:1E:A3
                DirName:/C=GB/ST=Gallifrey/L=Gallifrey/O=Dr Who/CN=Dr Who Root CA
                serial:03:91:DC:F3:FA:8D:5A:CA:D0:3D:B7:EE:1B:71:2D:60:B5:0A:99:DE

    Signature Algorithm: sha256WithRSAEncryption
         13:18:16:99:6a:42:be:22:14:e5:e8:80:5a:ce:be:df:33:c6:
         22:df:d5:35:48:e6:9d:9f:ec:ec:07:72:49:33:ca:ca:3f:22:
[ ... ]

The public key contained in this certificate is pair with the secret key that signed the certificate before. As one would expect, this following fields match:

  • The list of assignments in Subject of this certificate is exactly the same as the Issuer in the previous one.
  • The Subject Key Identifier here with Authority Key Identifier, as keyid, in the previous one.
  • This Certificate’s Serial number appears in Authority Key Identifier as serial.
  • This Certificate’s Issuer appears in Authority Key Identifier in a condensed form as DirName.

Except for the Subject to Issuer match, the other fields may be missing in certificates. There’s a brief description of how the certificate chain is validated below, after showing the root certificate. At this point, these relations are listed just to help figuring out which certificate certifies which.

Note that unlike the previous certificate, CA is TRUE, which means that this a CA certificate (as opposed to End Entities certificate). In other words, it’s intended for the sole use of signing other certificates (and it does, at least the one above).

Also note that pathlen is assigned zero. This means that the it’s used only to sign End Entity certificates.

Note that DirName in Authority Key Identifier equals this certificate’s Issuer. Recall that DirName is the Issuer of the certificate that certifies this one. Hence the conclusion is that the certificate that certifies this one has the same name for Subject and Issuer: So with this subtle clue, we know almost for sure that the certificate above this one is a root certificate. Why almost? Because non-root self-issued certificates are allowed in the spec, but kindly show me one.

Extended Key Usage is set to Time Stamping. Even though this was supposed to be unusual, as mentioned before, this is what non-root certificates for time stamping and code signing usually look like.

And as expected, the Key Usage is Certificate Sign and CRL Sign, as one would expect to find on a CA certificate.

A sample root CA certificate

And now we’re left with the holy grail: the root certificate.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            03:91:dc:f3:fa:8d:5a:ca:d0:3d:b7:ee:1b:71:2d:60:b5:0a:99:de
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Root CA
        Validity
            Not Before: Jan  1 00:00:00 2001 GMT
            Not After : May 19 00:00:00 2028 GMT
        Subject: C = GB, ST = Gallifrey, L = Gallifrey, O = Dr Who, CN = Dr Who Root CA
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:ce:e5:53:d7:1e:43:28:13:00:eb:b2:81:bb:ff:
                    28:23:98:9a:fd:69:07:ee:49:c5:54:44:66:77:5d:
[ ... ]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Subject Key Identifier:
                98:9A:E3:EF:D8:C5:5C:7F:87:35:87:45:78:3D:51:8D:82:2F:1E:A3
            X509v3 Authority Key Identifier:
                keyid:98:9A:E3:EF:D8:C5:5C:7F:87:35:87:45:78:3D:51:8D:82:2F:1E:A3
                DirName:/C=GB/ST=Gallifrey/L=Gallifrey/O=Dr Who/CN=Dr Who Root CA
                serial:03:91:DC:F3:FA:8D:5A:CA:D0:3D:B7:EE:1B:71:2D:60:B5:0A:99:DE

            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
    Signature Algorithm: sha256WithRSAEncryption
         30:92:7d:09:e4:ea:4d:81:dd:8e:c2:ba:c0:c4:a6:26:62:4d:
[ ... ]

All in all, it’s pretty similar to the previous non-root certificate, except that its Authority is itself: There is no way to validate this certificate, as there is no other public key to use. As mentioned above, root certificates are installed directly into the validating software’s database as the anchors of trust.

The list of fields that match between this and the previous certificate remains the same as between the previous certificate and its predecessor. Once again, not all fields are always present. Actually, there are a few fields that make no sense in a root certificate, and yet they are most commonly present. Let’s look at a couple of oddities (that are common):

  • The certificate is signed. One may wonder what for. The signature is always done with the Authority’s key, but in this case, its the key contained in the certificate itself. So this proves that the issuer of this certificate has the secret key that corresponds to the public key that is contained in the certificate. The need for a signature hence prevents issuing a root certificate for a public key without being the owner of the secret key. Why anyone would want to do that remains a question.
  • The certificate points at itself in the Authority Key Identifier extension. This is actually useful for spotting that this is indeed a root certificate, in particular when there are long and rambling names in the Subject / Issuer fields. But why the DirName?

How the certificate chain is validated

Chapter 6 of RFC 5280 offers a 20 pages long description of how a certificate chain is validated, and it’s no fun to read. However section 6.1.3 (“Basic Certificate Processing”) gives a concise outline of how the algorithm validates certificate Y based upon certificate X.

The algorithm given in the spec assumes that some other algorithm has found a candidate for a certificate chain. Chapter 6 describes how to check it by starting from root, and advancing one certificate pair at a time. This direction isn’t intuitive, as validation software usually encounters an End Entities certificate, and needs to figure out how to get to root from it. But as just said, the assumption is that we already know.

So the validation always trusts certificate X, and it checks if it can trust Y based upon the former. If so, it assigns X := Y and continues until it reaches the last certificate in the chain.

These four are the main checks that are made:

  1. The signature in certificate Y is validated with public key contained in certificate X.
  2. The Issuer part in certificate Y matches exactly the Subject of certificate X.
  3. Certificate Y’s validity period covers the time for which the chain is validated (the system clock time, or the timestamp’s time if such is applied).
  4. Certificate Y is not revoked.

Chapter 6 wouldn’t reach 20 pages if it was this simple, however much of the rambling in that chapter relates to certificate policies and other restrictions. The takeaway from this list of 4 criteria is where the focus is on walking from one certificate to another: The validation of the signature and matching the Subject / Issuer pairs.

I suppose that the Subject / Issuer check is there mostly to prevent certificates from being misleading to us humans: From a pure cryptographic point of view, no loopholes would have been created by skipping this test.

And this brings me back to what I started this post with: This whole thing with certificates has a bureaucratic side, and a cryptographic side. Both play a role.

This post is intentionally left blank

This post has been terminally removed. It’s pointless to ask me for a copy of it.

This post is intentionally left blank

This post has been terminally removed. It’s pointless to ask me for a copy of it.

Setting up a small Sphinx project for validating Linux kernel documentation RST markup

Introduction

Since I maintain a module in the Linux kernel, I also need to maintain its documentation. Sometime in the past, the rst format was adopted for files under Documentation/ in the kernel tree, with Sphinx chosen as the tool for making formatted documents. Which is pretty nice, as rst human readable and can also be used to produce HTML, PDF and other file formats.

But it means that when I make changes in the doc, I also need to check that it compiles correctly, and produces the desired result with Sphinx.

The idea is to edit and compile the documentation file in a separate directory, and then copy back the updated file into the kernel tree. Partly because trying to build the docs inside the kernel requires installing a lot of stuff, and odds are that I’ll be required to upgrade the tools continuously with time (in fact, it complained about my existing version of Sphinx already).

The less favorable side is that this format was originally developed for documenting Python code, and Sphinx itself is written in Python. Which essentially means that Python’s law, sorry, Murphy’s law applies to getting thing done: If there’s the smallest reason for the toolchain to fail, it will. I don’t know if it’s the Python language itself or the culture around it, but somehow when Python is in the picture, I know I’m going to have a bad day.

I’m running Linux Mint 19.

Install

# apt install python3-sphinx

A lot of packages are installed along with that, but fine.

Setting up

Copy the related .rst file into an empty directory, navigate to it, and go

$ sphinx-quickstart

This plain and interactive utility asks a lot of questions. The defaults are fine. It insists on giving the project a name, as well as stating the name of the author. Doesn’t matter too much.

It generates several files and directories, but for version control purposes, only these need to be saved: Makefile, conf.py and index.rst. The rest can be deleted, which will cause the tools to issue warnings on HTML builds. But nevertheless get the job done.

One warning can be silenced by adding a _static directory (the warning says it’s missing, so add it).

$ mkdir _static

This directory remains empty however. Why whine over a directory that isn’t used? Python culture.

Another thing Sphinx complains about is

WARNING: document isn't included in any toctree

That is fixed by adding a line in index.rst, with the name of the related .rst file, without the .rst suffix. Refer to index.rst files under the kernel’s Documentation subdirectory for examples.

Running

Once the setup is done and over with create an HTML file from the .rst file with

$ make html

If the defaults were accepted during the setup, the HTML file can be found in _build/html. Plus a whole lot of other stuff.

And then there’s of course

$ make clean

which works as one would expect.

Running Xilinx Impact on Linux Mint 19

Introduction

This is my short war story as I made Xilinx’ Impact, part of ISE 14.7, work on a Linux Mint 19 machine with a v4.15 Linux kernel. I should mention that I already use Vivado on the same machine, so the whole JTAG programming thing was already sorted out, including loading firmware into the USB JTAG adapters, whether it’s a platform cable or an on-board interface. All that was already history. It was Impact that refused to play ball.

In short, what needed to be done:

  • Make a symbolic link to activate libusb.
  • Make sure that the firmware files are installed, even if they’re not necessary to load the USB devices.
  • Make sure Vivado’s hardware manager isn’t running.
  • Don’t expect it to always work. It’s JTAG through USB, which is well-known for being cursed since ancient times. Sometimes “Initialize Chain” works right away, sometimes “Cable Auto Connect” is needed to warm it up, and sometimes just restart Impact, unplug and replug everything + recycle power on relevant card. Also apply spider leg powder as necessary with grounded elephant eyeball extract.

And now in painstaking detail.

The USB interface

The initial attempt to talk with the USB JTAG interface failed with a lot of dialog boxes saying something about windrvr6 and this:

PROGRESS_START - Starting Operation.
If you are using the Platform Cable USB, please refer to the USB Cable Installation Guide (UG344) to install the libusb package.
Connecting to cable (Usb Port - USB21).
Checking cable driver.
 Linux release = 4.15.0-20-generic.
WARNING:iMPACT -  Module windrvr6 is not loaded. Please reinstall the cable drivers. See Answer Record 22648.
Cable connection failed.

This is horribly misleading. windrvr6 is a Jungo driver which isn’t supported for anything by ancient kernels. Also, the said Answer Record seems to have been deleted.

Luckily, there’s a libusb interface as well, but it needs to be enabled. More precisely, Impact needs to find a libusb.so file somewhere. Even more precisely, this is some strace output related to its attempts:

openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE//lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE/lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE/sysgen/lib/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/EDK/lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/common/lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[ ... ]
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

It so happens that a libusb module is present among the files installed along with ISE (several times, actually), so it’s enough to just

$ cd /opt/xilinx/14.7/ISE_DS/ISE/lib/lin64/
$ ln -s libusb-1.0.so.0 libusb.so

or alternatively, a symlink to /usr/lib/x86_64-linux-gnu/libusb-1.0.so worked equivalently well on my system.

But then

Trying to initialize the chain I got:

PROGRESS_START - Starting Operation.
Connecting to cable (Usb Port - USB21).
Checking cable driver.
File version of /opt/xilinx/14.7/ISE_DS/ISE/bin/lin64/xusbdfwu.hex = 1030.
 Using libusb.
Please run `source ./setup_pcusb` from the /opt/xilinx/14.7/ISE_DS/ISE//bin/lin64 directory with root privilege to update the firmware. Disconnect and then reconnect the cable from the USB port to complete the driver update.
Cable connection failed.

So yey, it was not going for libusb. But then it refused to go on.

Frankly speaking, I’m not so much into running any script with root privileges, knowing it can mess up things with the working Vivado installation. On my system, there was actually no need, because I had already installed and then removed the cable drivers (as required by ISE).

What happened here was that Impact looked for firmware files somewhere in /etc/hotplug/usb/, assuming that if they didn’t exist, then the USB device must not be loaded with firmware. But it was in my case. And yet, Impact refused on the grounds that the files couldn’t be found.

So I put those files back in place, and Impact was happy again. If you don’t have these files, an ISE Lab Tools installation should do the trick. Note that it also installs udev rules, which is what I wanted to avoid. And also that the installation will fail, because it includes compiling the Jungo driver against the kernel, and there’s some issue with that. But as far as I recall, the kernel thing is attempted last, so the firmware files will be in place. I think.

Or installing them on behalf of Vivado is also fine? Note sure.

Identify failed

Attempting to Cable Auto Connect, I got Identify Failed and a whole range of weird errors. Since I ran Impact from a console, I got stuff like this on the terminal:

ERROR set configuration. strerr=Device or resource busy.
ERROR claiming interface.
ERROR setting interface.
ERROR claiming interface in bulk transfer.
bulk tranfer failed, endpoint=02.
ERROR releasing interface in bulk transfer.
ERROR set configuration. strerr=Device or resource busy.
ERROR claiming interface.
ERROR setting interface.
control tranfer failed.
control tranfer failed.

This time it was a stupid mistake: Vivado’s hardware manager ran at the same time, so the two competed. Device or resource busy or not?

So I just turned off Vivado. And voila. All ran just nicely.

Bonus: Firmware loading confusion

I mentioned that I already had the firmware loading properly set up. So it looked like this in the logs:

Feb 13 11:58:18 kernel: usb 1-5.1.1: new high-speed USB device number 78 using xhci_hcd
Feb 13 11:58:18 kernel: usb 1-5.1.1: New USB device found, idVendor=03fd, idProduct=000d
Feb 13 11:58:18 kernel: usb 1-5.1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Feb 13 11:58:18 systemd-udevd[59619]: Process '/alt-root/sbin/fxload -t fx2 -I /alt-root/etc/hotplug/usb/xusbdfwu.fw/xusb_emb.hex -D ' failed with exit code 255.

immediately followed by:

Feb 13 11:58:25 kernel: usb 1-5.1.1: new high-speed USB device number 80 using xhci_hcd
Feb 13 11:58:25 kernel: usb 1-5.1.1: New USB device found, idVendor=03fd, idProduct=0008
Feb 13 11:58:25 kernel: usb 1-5.1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 13 11:58:25 kernel: usb 1-5.1.1: Product: XILINX
Feb 13 11:58:25 kernel: usb 1-5.1.1: Manufacturer: XILINX

This log contains contradicting messages. On one hand, the device is clearly re-enumerated with a new product ID, indicating that the firmware load went fine. On the other hand, there’s an error message saying fxload failed.

I messed around quite a bit with udev because of this. The problem is that the argument to the -D flag should be the path to the device files of the USB device, and there’s nothing there. In the related udev rule, it says $devnode, which should substitute to exactly that. Why doesn’t it work?

The answer is that it actually does work. For some unclear reason, the relevant udev rule is called a second time, and on that second time $devnode is substituted with nothing. Which is harmless because it fails royally with no device file to poke. Except for that confusing error message.

systemd: Shut down computer at a certain uptime

Motivation

Paid-per-time cloud services. I don’t want to forget one of those running, just to get a fat bill at the end of the month. And if the intended use is short sessions anyhow, make sure that the machine shuts down by itself after a given amount of time. Just make sure that a shutdown by the machine itself accounts for cutting the costs. And sane cloud provider does that except for, possibly, costs for storing the VM’s disk image.

So this is the cloud computing parallel to “did I lock the door?”.

The examples here are based upon systemd 241 on Debian GNU/Linux 10.

The main service

There is more than one way to do this. I went for two services: One that calls /sbin/shutdown with a five minute delay (so I get a chance to cancel it) and then second is a timer for the uptime limit.

So the main service is this file as /etc/systemd/system/uptime-limiter.service:

[Unit]
Description=Limit uptime service

[Service]
ExecStart=/sbin/shutdown -h +5 "System it taken down by uptime-limit.service"
Type=simple

[Install]
WantedBy=multi-user.target

The naïve approach is to just enable the service and expect it to work. Well, it does work when started manually, but when this service starts as part of the system bringup, the shutdown request is registered but later ignored. Most likely because systemd somehow cancels pending shutdown requests when it reaches the ultimate target.

I should mention that adding After=multi-user.target in the unit file didn’t help. Maybe some other target. Don’t know.

The timer service

So the way to ensure that the shutdown command is respected is to trigger it off with a timer service.

The timer service as /etc/systemd/system/uptime-limiter.timer, in this case allows for 6 hours of uptime (plus the extra 5 minutes given by the main service):

[Unit]
Description=Timer for Limit uptime service

[Timer]
OnBootSec=6h
AccuracySec=1s

[Install]
WantedBy=timers.target

and enable it:

# systemctl enable uptime-limiter.timer
Created symlink /etc/systemd/system/timers.target.wants/uptime-limiter.timer → /etc/systemd/system/uptime-limiter.timer.

Note two things here: That I enabled the timer, not the service itself, by adding the .timer suffix. And I didn’t start it. For that, there’s the –now flag.

So there are two steps: When the timer fires off, the call to /sbin/shutdown takes place, and that causes nagging wall messages to start once a minute, and eventually a shutdown. Mission complete.

What timers are pending

Ah, that’s surprisingly easy:

# systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sun 2021-01-31 17:38:28 UTC  14min left    n/a                          n/a          systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2021-01-31 20:50:22 UTC  3h 26min left Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily.timer              apt-daily.service
Sun 2021-01-31 23:23:28 UTC  5h 59min left n/a                          n/a          uptime-limiter.timer         uptime-limiter.service
Sun 2021-01-31 23:23:34 UTC  5h 59min left Sun 2021-01-31 17:23:34 UTC  44s ago      google-oslogin-cache.timer   google-oslogin-cache.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago logrotate.timer              logrotate.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago man-db.timer                 man-db.service
Mon 2021-02-01 06:49:19 UTC  13h left      Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily-upgrade.timer      apt-daily-upgrade.service

Clean and simple. And this is probably why this method is better than a long delay on shutdown, which is less clear about what it’s about to do, as shown next.

Note that a timer service can be stopped, which is parallel to canceling a shutdown. Restarting it to push the time limit further won’t work in this case, because the service is written related to OnBootSec.

Is there a shutdown pending?

To check if a shutdown is about to happen:

$ cat /run/systemd/shutdown/scheduled
USEC=1612103418427661
WARN_WALL=1
MODE=poweroff
WALL_MESSAGE=System it taken down by uptime-limit.service

There are different reports on what happens when the shutdown is canceled. On my system, the file was deleted in response to “shutdown -c”, but not when the shutdown was canceled because the system had just booted up. There’s other suggested ways too, but in the end, it appears like there’s no definite way to tell if a system has a shutdown scheduled or not. At least not as of systemd 241.

That USEC line is the epoch time for when shutdown will take place. A Perl guy like me goes

$ perl -e 'print scalar gmtime(1612103418427661/1e6)'

but that’s me.

What didn’t work

So this shows what doesn’t work: Enable the main service (as well as start it right away with the –now flag):

# systemctl enable --now uptime-limiter
Created symlink /etc/systemd/system/multi-user.target.wants/uptime-limiter.service → /etc/systemd/system/uptime-limiter.service.

Broadcast message from root@instance-1 (Sun 2021-01-31 14:15:19 UTC):

System it taken down by uptime-limit.service
The system is going down for poweroff at Sun 2021-01-31 14:25:19 UTC!

So the broadcast message is out there right away. But this is misleading: It won’t work at all when the service is started automatically during system boot.

Some notes on how an air conditioner works

I’m not an AC guy. Not at all.

But I do own an Tadiran Wave INV 40/3 installed 2016, and then I had some issues with it. In hindsight, there was some real kind of problem a few years ago, and the rest of the problems were because the AC guy that came to fix the original problem replaced a 100k thermistor with a 5ok one. So the microcontroller detected something was wrong every now and then, and stopped the show (preferably on a hot summer day). He used to come, change the thermistor again, and the machine worked nicely until the next time. Why this helped on the first occasion and on those that followed is beyond me. Eventually I acquired and replaced the thermistor myself, with rated value given in the installation manual, and I hope that’s the end of it.

By the way, a 50k thermistor instead of a 100k thermistor makes the controller think the temperature is 16°C higher than it actually is. So it’s not wrong enough to make prevent the machine from kicking off, but bad enough to dislike it enough to stop after a few months. Or a year.

I have no training in the field of air conditioning or anything of that field, and my memories from the Thermodynamics course I took (and somehow passed with a fair grade) is that the world is turning into a mess. Being a human, I’m always at the risk of writing rubbish, but on this post I feel pretty free to do so. If I got something wrong, kindly correct me below. If you blew your compressor because anything I wrote below, blame yourself for taking advice from someone who has no clue.

But the story with the thermistor is yet another piece of evidence that if something is important, better be involved in the gory details. Not necessarily those written below.

The jots

  • The name of the game is the refrigerant’s boiling temperature, which changes with the pressure. The compressor pushes coolant towards the condenser’s coil (“radiator”), which travels downwards and cools down there, turning into liquid somewhere in the middle of it. As the refrigerant cools down all the way through the coil, it reaches the end of the coil at a temperature that is lower than the boiling point (but the pressure remains more or less uniform). The difference between the measured temperature at the boiling point calculated from the pressure is called subcooling, and gives an indication of the level of the refrigerant inside the coil. That is, where in the coil it turned from gas to fluid.
  • By the same coin, inside the evaporator, the fluid refrigerant (actually, mixed with some gas due to spark evaporation at the metering device) travels upwards and heats up by the inside air it cools. Somewhere in the middle the fluid reaches its boiling point, and continues as gas. Once again, the difference between the measured temperature at the end of the coil and the boiling point indicated by the pressure is called superheat, and indicates where in the coil this took place. This too gives an indication of the refrigerant fill.
  • The measurement of temperature and pressure is typically done at the outdoor unit’s connection to the pipes. There might be differences in pressure and temperature from those actually relevant at the evaporator coil, but this is good enough for figuring out the overall situation.
  • The cooling gas is R410A. According to the installer’s guide for my AC (page 34), the equilibrium temperature is 4°C with pressure 116 PSIG on the evaporator, and 50°C with 429 PSIG at the condenser (AC unit working as cooler). These figures are possible to derive, because there are typical temperatures of the refrigerant at the end of both coils: The environment’s temperature at the end of condenser’s coil, and some 15°C at the evaporator’s (?). This can however be deadly wrong if either coil (or both) don’t function properly, most likely due to poor ventilation (dirty filters, obstructions, poor ventilation path indoors etc.).
  • The transition from high pressure to low occurs at the metering device, which is basically a narrow pipe. TXV / TEV valves are commonly in use. In a typical in/outdoor unit setting, the metering device is indoors, next to the coil. This means that the refrigerant flows in the pipe towards the indoor unit as a compressed fluid — this is referred to as the “liquid line” and is therefore the narrower and hotter pipe. Narrower, because the flow is slower and the refrigerant is expensive. In the other direction, the refrigerant flows as a gas, in the wider, cooler pipe. It’s a bit confusing that the hotter pipe is the one containing liquid, but it’s all a matter of pressures.
  • Note that in both coils, the higher part is hotter: In the condenser because hot refrigerant is pushed from above, and it cools down as it travels downwards. In the evaporator, the fluid arrives from below, and travels upwards as it heats up. Consequently, in both coils, a physically higher transition point between fluid and gas inside the coil means a higher pressure — because the temperature there is higher. This is how the fluid levels even out between the condenser and evaporator: In steady state, the compressor generates a constant pressure difference and flow on the gas side, which works against the metering valve and differences in altitude. By conservation of mass, there must be a certain amount of fluid in the system to accommodate the existing amount of refrigerant (to complete the amount that is in gas form). If the fluid level rises on one coil, it has to drop on the other. But since a rising fluid level also means a rise in pressure, it causes more pressure against the metering device as well as the compressor. So a rise in the fluid level makes the coil push harder fluid downwards, and gas upwards making refrigerant to travel more towards the other coil, and the fluid level to drop on the current one.
  • Another way to look at it: The pressure difference imposed by the compressor dictates a difference between the temperatures of the liquid-to-fluid transition points of both coils. Well more or less, because the pressure to boiling point relation isn’t linear. So given the amount of refrigerant in the system, the fluid is distributed in the segment from the condenser’s coils bottom, through the liquid line and the metering device to the evaporator coil’s bottom. The levels of the fluids stabilize so as to satisfy the temperature difference imposed by the compressor.
  • However if there is too little refrigerant (a leak…), then this temperature difference relation can’t be satisfied. The result, say people out there, is bubble noise in the pipes and accumulation of ice at the outdoor unit’s wider pipe.
  • The metering valve is designed to create a pressure gap with fluids, so it the pressure difference is smaller when gas runs through it. The compressor drops the pressure in the (wider) gas pipe, causing the temperature to drop well below zero, and ice accumulates.
  • The viscosity of R410A is more than ten times larger as a fluid than as a gas at 10°C, so if it reaches the metering device as a gas, it flows through much easier. Hence if there isn’t enough refrigerant in the system — that is, the segment from the condeser coil’s bottom to the metering device isn’t filled with continuous liquid — gas will travel quickly through the metering device, causing a pressure drop in the liquid line, which even more gas to evaporate in the liquid line (creating bubbles that are often audible).
  • Note however that accumulation of ice could be just the result of poor ventilation of the indoor unit, for example due to dust in the filters: If the gas isn’t warmed up enough in the indoor unit, its overall temperature cycle drops, and it reaches the outdoor unit very cool.
  • How do you tell? Well, air condition guys ask about those filters every time for a reason. In addition, inverter AC systems have temperature sensors before, in the middle of and after the coil in both indoor and outdoor units. This way, the microcontroller can tell what the fluid level is on both sides. If it’s too low, odds are that it will correctly detect that situation and issue an error code on the user panel, and more accurately with a blinking LED on the controller PCB, either the one indoor or outdoor. It’s a good idea to download the installer’s manual for the specific AC model, if you’re brave enough to open your AC on a hot summer’s day.
  • “Heat pump” is just the same of an AC that can also do heating.
  • In English, HVAC is a good searching keyword for air conditioning.

Videos

Apache 2.4: RewriteRule with [NE] causing 500 Internal Server Error

This is the weirdest thing. With an Apache 2.4.10 on Linux Debian 8 (yes, old), and a relatively simple mod_rewrite rule in .htaccess going

RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*) https://www.mysite.com/$1 [R=301,L,NE]

This is really nothing special. Just pushing users to the www host name, if they were lazy typing it.

This works almost perfectly, but then I have this thing about trying odd stuff. So I get an internal server error when I request the page https://mysite.com/%01, but not mysite.com/%1 and not mysite.com/%20. In other words, I get a server error when there’s a legal percent-escape of a character in the %01-%1f range. Not with %00, and not with %20 and up, and not when there’s just one digit. It’s really a matter of a proper percent escape.

This %-escape doesn’t have to be in the beginning of the string. https://mysite.com/mything%01.html is equally offensive. But it appears like a “?” neutralizes the problem — there is no server error if a “?” appeared before the offending %-escape. Recall that characters after a “?” are normally not escaped.

And of course, there is no problem accessing https://www.mysite.com/mything%01.html (getting a 404, but fine), as the rule doesn’t come to effect.

The internal server error leaves no traces in neither the error log nor the access log. So there’s no hint there. Neither in the journalctl log. No process dies either.

The problem is clearly the NE flag (marked red above), telling mod_rewrite not to escape the the string it rewrites. Why this is an issue is beyond me. I could try asking in forums and such, but I have hunch on the chances getting an answer.

Exactly the same behavior is observed when this rule is in an Apache configuration file (in <Virtualhost> context).

So I just dropped the NE flag. Actually, this is a note to self explaining why I did it.

One could argue that it’s pretty harmless to get an error on really weird URL requests, however often a serious vulnerability begins with something that just went a little wrong. So I prefer to stay away.

If anyone has insights on this matter, please comment below. Maybe it has been silently fixed in later versions of Apache?

Systemd services as cronjobs: No process runs away

But why?

Cronjobs typically consists of a single utility which we’re pretty confident about. Even if it takes quite some time to complete (updatedb, for example), there’s always a simple story, a single task to complete with a known beginning and end.

If the task involves a shell script that calls a few utilities, that feeling of control fades. It’s therefore reassuring to know that everything can be cleaned up neatly by simple stopping a service. Systemd is good at that, since all processes that are involved in the service are kept in a separate cgroup. So when the service is stopped, all processes that were possibly generated eventually get a SIGKILL, typically 90 seconds after the request to stop the service, unless they terminated voluntarily in response to the initial SIGTERM.

Advantage number two is that the systemd allows for a series of capabilities to limit what the cronjob is capable of doing, thanks to the cgroup arrangement. This doesn’t fall very short from the possibilities of container virtualization, with pretty simple assignments in the unit file. This includes making certain directories inaccessible or accessible for read-only, setting up temporary directories, disallow external network connection, limit the set of allowed syscalls, and of course limit the amount of resources that are consumed by the service. They’re called Control Groups for a reason.

There’s also the RuntimeMaxSec parameter in the service unit file, which is the maximal wall clock time the service is allowed to run. The service is terminated and put in failure state if this time is exceeded. This is however supported from systemd version 229 and later, so check with “systemctl –version”.

My original idea was to use systemd timers to kick off the job, and let RuntimeMaxSec make sure it would get cleaned up if it ran too long (i.e. got stuck somehow). But because the server in question ran a rather old version of systemd, I went for a cron entry for starting the service and another one for stopping it, with a certain time difference between them. In hindsight, cron turned to be neater for kicking off the jobs, because I had multiple variants of them in different times. So one single file enclosed all.

The main practical difference is that if a service reaches RuntimeMaxSec, it’s terminated with a failed status. The cron solution stops the service without this. I guess there’s a systemctl way to achieve the failed status, if that’s really important.

As a side note, I have a separate post on Firejail, which is yet another possibility to use cgroups for controlling what processes do.

Timer basics

The idea is simple: A service can be started as a result of a timer event. That’s all that timer units do.

Timer units are configured like any systemd units (man systemd.unit) but have a .timer suffix and a dedicated [Timer] section. By convention, the timer unit named foo.timer activates the service foo.service, unless specified differently with the Unit= assignment (useful for generating confusion).

Units that are already running when the timer event occurs are not restarted, but are left to keep running. Exactly like systemctl start would do.

For an cronjob-style timer, use OnCalendar= to specify the times. See man systemd.time for the format. Note that AccuracySec= should be set too to control how much systemd can play with the exact time of execution, or systemd’s behavior might be confusing.

To see all active timers, go

$ systemctl list-timers

The unit file

As usual, the unit file (e.g. /etc/systemd/system/cronjob-test@.service) is short and concise:

[Unit]
Description=Cronjob test service

[Service]
ExecStart=/home/eli/shellout/utils/shellout.pl "%I"
Type=simple
User=eli
WorkingDirectory=/home/eli/shellout/utils
KillMode=mixed
NoNewPrivileges=true

This is a simple service, meaning that systemd expects the process launched by ExecStart to run in the foreground.

Note however that the service unit’s file name has a “@” character and that %I is used to choose what to run, based upon the unescaped instance name (see main systemd.unit). This turns the unit file into a template, and allows choosing an arbitrary command (the shellout.pl script is explained below) with something like (really, this works)

# systemctl start cronjob-test@'echo "Hello, world"'

This might seems dangerous, but recall that root privileges are required to start the service, and you get a plain-user process (possibly with no ability to escalate privileges) in return. Not the big jackpot.

For stopping the service, exactly the same service specifier string is required. But it’s also possible to stop all instances of a service with

# systemctl stop 'cronjob-test@*'

How neat is that?

A few comments on this:

  • The service should not be systemd-wise enabled (i.e. no “systemctl enable”) — that’s what you do to get it started on boot or following some kind of event. This is not the case, as the whole point is to start the service directly by a timer or crond.
  • Accordingly, the service unit file does not have an [Install] section.
  • A side effect of this is that the service may not appear in the list made by “systemctl” (without any arguments) unless it has processes running on its behalf currently running (or possibly if it’s in the failed state). Simple logic: It’s not loaded unless it has a cgroup allocated, and the cgroup is removed along with the last process. But it may appear anyhow under some conditions.
  • ExecStart must have a full path (i.e. not relative) even if the WorkingDirectory is set. In particular, it can’t be ./something.
  • A “systemctl start” on a service that is marked as failed will be started anyhow (i.e. the fact that it’s marked failed doesn’t prevent that). Quite obvious, but I tested it to be sure.
  • Also, a “systemctl start” causes the execution of ExecStart if and only if there’s no cgroup for it, which is equivalent to not having a process running on its behalf
  • KillMode is set to “mixed” which sends a SIGTERM only to the process that is launched directly when the service is stopped. The SIGKILL 90 seconds later, if any, is sent to all processes however. The default is to give all processes in the cgroup the SIGTERM when stopping.
  • NoNewPrivileges is a little paranoid thing: When no process has any reason to change its privileges or user IDs, block this possibility. This mitigates damage, should the job be successfully attacked in some way. But I ended up not using it, as running sendmail fails (it has some setuid thing to allow access to the mail spooler).

Stopping

There is no log entry for a service of simple type that terminates with a success status. Even though it’s stopped in the sense that it has no allocated cgroup and “systemctl start” behaves as if it was stopped, a successful termination is silent. Not sure if I like this, but that’s the way it is.

When the process doesn’t respond to SIGTERM:

Jan 16 19:13:03 systemd[1]: Stopping Cronjob test service...
Jan 16 19:14:33 systemd[1]: cronjob-test.service stop-sigterm timed out. Killing.
Jan 16 19:14:33 systemd[1]: cronjob-test.service: main process exited, code=killed, status=9/KILL
Jan 16 19:14:33 systemd[1]: Stopped Cronjob test service.
Jan 16 19:14:33 systemd[1]: Unit cronjob-test.service entered failed state.

So there’s always “Stopping” first and then “Stopped”. And if there are processes in the control group 90 seconds after “Stopping”, SIGKILL is sent, and the service gets a “failed” status. Not being able to quit properly is a failure.

A “systemctl stop” on a service that is already stopped is legit: The systemctl utility returns silently with a success status, and a “Stopped” message appears in the log without anything actually taking place. Neither does the service’s status change, so if it was considered failed before, so it remains. And if the target to stop was a group if instances (e.g. systemctl stop ‘cronjob-test@*’) and there were no instances to stop, there’s even not a log message on that.

Same logic with “Starting” and “Started”: A superfluous “systemctl start” does nothing except for a “Started” log message, and the utility is silent, returning success.

Capturing the output

By default, the output (stdout and stderr) of the processes is logged in the journal. This is usually pretty convenient, however I wanted the good old cronjob behavior: An email is sent unless the job is completely silent and exits with a success status (actually, crond doesn’t care, but I wanted this too).

This concept doesn’t fit systemd’s spirit: You don’t start sending mails each time a service has something to say. One could use OnFailure for activating another service that calls home when the service gets into a failure status (which includes a non-success termination of the main process), but that mail won’t tell me the output. To achieve this, I wrote a Perl script. So there’s one extra process, but who cares, systemd kills’em all in the end anyhow.

Here it comes (I called it shellout.pl):

#!/usr/bin/perl

use strict;
use warnings;

# Parameters for sending mail to report errors
my $sender = 'eli';
my $recipient = 'eli';
my $sendmail = "/usr/sbin/sendmail -i -f$sender";

my $cmd = shift;
my $start = time();

my $output = '';

my $catcher = sub { finish("Received signal."); };

$SIG{HUP} = $catcher;
$SIG{TERM} = $catcher;
$SIG{INT} = $catcher;
$SIG{QUIT} = $catcher;

my $pid = open (my $fh, '-|');

finish("Failed to fork: $!")
  unless (defined $pid);

if (!$pid) { # Child process
  # Redirect stderr to stdout for child processes as well
  open (STDERR, ">&STDOUT");

  exec($cmd) or die("Failed to exec $cmd: $!\n");
}

# Parent
while (defined (my $l = <$fh>)) {
  $output .= $l;
}

close $fh
 or finish("Error: $! $?");

finish("Execution successful, but output was generated.")
 if (length $output);

exit 0; # Happy end
sub finish {
  my ($msg) = @_;

  my $elapsed = time() - $start;

  $msg .= "\n\nOutput generated:\n\n$output\n"
    if (length $output);

  open (my $fh, '|-', "$sendmail $recipient") or
    finish("Failed to run sendmail: $!");

  print $fh <<"END";
From: Shellout script <$sender>
Subject: systemd cron job issue
To: $recipient

The script with command \"$cmd\" ran $elapsed seconds.

$msg
END

  close $fh
    or die("Failed to send email: $! $?\n");

  $SIG{TERM} = sub { }; # Not sure this matters
  kill -15, $$; # Kill entire process group

  exit(1);
}

First, let’s pay attention to

open (STDERR, ">&STDOUT");

which makes sure standard error is redirected to standard output. This is inherited by child processes, which is exactly the point.

The script catches the signals (SIGTERM in particular, which is systemd’s first hint that it’s time to pack and leave) and sends a SIGTERM to all other processes in turn. This is combined with KillMode being set to “mixed” in the service unit file, so that only shellout.pl gets the signal, and not the other processes.

The rationale is that if all processes get the signal at once, it may (theoretically?) turn out that the child process terminates before the script reacted to the signal it got itself, so it will fail to report that the reason for the termination was a signal, as opposed to the termination of the child. This could miss a situation where the child process got stuck and said nothing when being killed.

Note that the script kills all processes in the process group just before quitting due to a signal it got, or when the invoked process terminates and there was output. Before doing so, it sets the signal handler to a NOP, to avoid an endless loop, since the script’s process will get it as well (?). This NOP thing appears to be unnecessary, but better safe than sorry.

Also note that the while loop quits when there’s nothing more in <$fh>. This means that if the child process forks and then terminates, the while loop will continue, because unless the forked process closed its output file handles, it will keep the reference count of the script’s stdin above zero. The first child process will remain as a zombie until the forked process is done. Only then will it be reaped by virtue of the close $fh. This machinery is not intended for fork() sorcery.

I took a different approach in another post of mine, where the idea was to fork explicitly and modify the child’s attributes. Another post discusses timing out a child process in general.

Summary

Yes, cronjobs are much simpler. But in the long run, it’s a good idea to acquire the ability to run cronjobs as services for the sake of keeping the system clean from runaway processes.