Using MGTs in FPGA designs: Why the data is organized in packets

Introduction

I’ll start with a correction: Indeed, application logic transmitting data from one FPGA to another is required to organize the data in some kind of packets or frames, but there’s one exception, which I’ll discuss later on: Xillyp2p. Anyhow, let’s take it from the beginning.

Multi-Gigabit Transceivers (MGTs, sometimes also referred to as RocketIO, GTX, GTH, GTY, GTP, GTM, etc.) have long ago become the de facto standard for serialized data communication between digital components. The most famous use cases are for a computer and its peripheral (often between the CPU’s companion chip and a peripheral), for example, PCIe, SuperSpeed USB (a.k.a. USB 3.x), and SATA. Also related to computers, Gigabit Ethernet (as well as 10GbE) is based upon MGTs, and the DisplayPort protocol can be used for connecting a graphics card with the monitor.

Many FPGAs are equipped with MGTs. These are often used for turning the FPGA into a computer peripheral (with the PCIe protocol, possibly using Xillybus, or with the SuperSpeed USB protocol, possibly using XillyUSB, or as a storage device with SATA). Gigabit Ethernet can also play in, allowing the FPGA to communicate with a computer with this protocol. Another use of MGTs is for connecting to electronic components, in particular ADC/DAC devices with a very high sampling frequency, hence requiring a high data rate.

But what about communication between FPGAs? At times, there are several FPGAs on a PCB that need to exchange information among themselves, possibly at high rates. In other usage scenarios, there’s a physical distance between the FPGAs. For example, test equipment often has a hand-held probe containing one FPGA that collects information, and a second FPGA that resides inside the table-top unit. If the data rate is high, MGTs on both sides make it possible to avoid heavy, cumbersome and error-prone cabling. In fact, a thin fiber-optic cable is a simple solution when MGTs are used anyhow, and in some scenarios it also offers an extra benefit, except for being lightweight: Electrical isolation. This is in particular important in some medical applications (for electrical safety) or when long cables need to be drawn outdoors (avoiding being hit by lightning).

Among the annoying things about MGT communication there’s the fact that the data flow somehow always gets organized in packets (or frames, bursts, pick your name for it), and these packets don’t necessarily align properly with the application data’s natural boundaries. Why is that so?

This post attempts to explain why virtually all protocols (e.g. Interlaken, RapidIO, AMD’s Aurora, and Altera’s SeriaLite) require the application data to be arranged in some kind of packets that are enforced by the protocol. The only exception is Xillyp2p, which presents error-free continuous channels from one FPGA to another (or with packets that are sensible for the application data). This is not to say that packets aren’t used under the hood; it’s just that this packet mechanism is transparent to the application logic.

I’ll discuss a few reasons for the use of packets:

  • Word alignment
  • Error detection and retransmission
  • Clock frequency differences

Reason #1: Word alignment

When working with an MGT, it’s easy to forget that the transmitted data is sent as a serial data stream of bits. The fact that both the transmitting and receiving side have the same data word width might give the false impression that the MGT has some magic way of aligning the word correctly at the receiver side. In reality, there is no such magic. There is no hidden trick allowing the receiver to know which bit is the first or last in a transmitted word. This is something that the protocol needs to take care of, possibly with some help from the MGT’s features.

When 8b/10b encoding is used, the common solution is to transmit a synchronization word, often referred to as a comma, which is known as the K28.5 symbol. This method takes advantage of the fact that the 8b/10b encoding uses 10 bits on the wire for each 8 bits of payload data for transmission. And this allows for a small number of extra codes for transmission, that can’t be just regular data. These extra codes are called K-symbols, and K28.5 is one of them.

Hence if the bit sequence for a K28.5 symbol is encountered on the raw data link, it can’t be a data word. Most MGTs in FPGAs have a feature allowing them to automatically align the K28.5 word to the beginning of a word boundary. So word alignment can be ensured by transmitting a comma symbol. The comma symbol is often used to reset the scrambler as well, if such is used.

Each protocol defines when the comma is transmitted. There are many variations on this topic, but they all boil down to two alternatives:

  • Transmitting comma symbols occasionally and periodically. Or possibly, as part of the marker for the beginning of a packet.
  • Transmitting comma symbols only as part of an initialization of the channel. This alternative is adopted by protocols like SuperSpeed USB and PCIe, which have specific patterns for initializing the channel, referred to as Ordered Sets for Training and Recovery. These patterns include comma symbols, among others.

Truth to be told, if the second approach is taken, the need for word alignment isn’t a reason by itself for dividing the data into packets, as the alignment takes place once and is preserved afterwards. But the concept of initializing the channel is quite complicated, and is not commonly adopted.

There are other methods for achieving word alignment, in particular when 8b/10b encoding isn’t used. The principles remain the same, though.

Reason #2: Error detection and retransmission

When working with an MGT, bit errors must be taken into account. These errors mean simply that a ’0′ is received for a bit that was transmitted as a ’1′, or vice versa. In some hardware setups such errors may occur relatively often (with a rate of say, 10-9, which usually means more than once per second), and with other setups they may practically never occur. If an error in the application data can’t be tolerated, a detection mechanism for these bit errors must be in place at the very least, in order to prevent delivery of incorrectly received data to the application logic. Even if a link appears to be completely error free judging by long-term experience, this can’t be guaranteed in the long run, in particular as electronic components from different manufacturing batches are used.

In order to detect errors, some kind of CRC (or other redundant data) must be inserted occasionally in order to allow the receiver to check if the data has arrived correctly. As the CRC is always calculated on a segment (whether it has a fixed length or not), the information must be divided into packets, even if just for the purpose of attaching a CRC to each.

And then we have the question of what to do if an error is detected. There are mainly two possibilities:

  • Requesting a retransmission of the faulty packet. This ensures that an error-free channel is presented to the application logic.
  • Informing the application logic about the error, possibly halting the data flow so that faulty data isn’t delivered. This requires the application logic to somehow recover from this state and restart its operation.

High-end protocols like PCIe, SATA and SuperSpeed USB take the first approach, and ensure that all packets arrive correctly by virtue of a retransmission mechanism.

Gigabit Ethernet takes the second approach — there’s a CRC on the Ethernet packets, but the Ethernet protocol itself doesn’t intervene much if a packet arrives with an incorrect CRC. Such a packet is simply discarded (either by the hardware implementing the protocol or by software), so faulty data doesn’t go further. Even the IP protocol, which is usually one level above, does nothing special about the CRC error and the packet loss that occurred as a result of it. It’s only the TCP protocol that eventually detects the packet loss by virtue of a timeout, and requests retransmission.

What about FPGA-to-FPGA protocols, then? Well, each protocol takes its own approach. Xillyp2p is special in that it requests retransmissions when the physical link is bidirectional, but if the link is unidirectional it only discards the faulty data and halts everything until the application logic resumes operation — a retransmission request is impossible in the latter case.

Reason #3: Clock frequency differences

Clock frequency differences should have been the first topic, because it’s the subtle detail that prevents the solution that most FPGA engineers would consider at first for communication between two FPGAs: One FPGA sends a stream of data words at a regular pace, and the other FPGA receives and processes it. Simple and clean.

But I put it third and last, because it’s the most difficult to deal with, and the explanations became really long. So try to hang on. And if you don’t, here’s the short version: The transmission of data can’t be continuous, because the receiver’s clock might be just a few ppm slower. Hence the rate at which the receiver can process arriving data might be slightly lower than the transmitter’s rate, if it keeps sending data non-stop. So to avoid the receiver from being overflowed with data, the transmitter must pause the flow of application data every now and then to let the receiver catch up. And if there are pauses, the segments between these pauses are some kind of packets.

And now, to the long explanation, starting with the common case: The data link is bidirectional, and the data content in both directions is tightly related. Even if application data goes in one direction primarily, there is often some kind of acknowledgement and/or status information going the other way. All “classic” protocols for computers (PCIe, USB 3.x and SATA) are bidirectional, for bidirectional data as well as acknowledge packets, and there is usually a similar need when connecting two FPGAs.

The local and CDR clocks

I’ll need to make a small detour now and discuss clocks. Tedious, but necessary.

In most applications, each of the two involved FPGAs uses a different reference clock to drive its MGT, and the same reference clock is often used to drive the logic around it. These reference clocks of the two FPGAs have the same frequency, except for a small tolerance. Small, but causes big trouble.

Each MGT transmits data based upon its own reference clock (I’ll explain below why it’s always this way). The logic in the logic fabric that produces the data for transmission is usually driven by a clock derived from the same reference clock. In other words, the entire transmission chain is derived from the local reference clock.

The natural consequence is that the data which the MGT receives is based upon the other side’s reference clock. The MGT receiving this data stream locks a local clock oscillator on the data rate of the arriving data stream. This mechanism is referred to as clock data recovery, CDR. The MGT’s logic that handles the arriving data stream is clocked by the CDR clock, and is hence synchronized with this data stream’s bits.

Unlike most other IP blocks in an FPGA, the clocks that are used to interface with the MGT are outputs from the MGT block. In other words, the MGT supplies the clock to the logic fabric, and not the other way around. This is a necessary arrangement, not only because the MGT generates the CDR clock: The main reason is that the MGT is responsible for handling the clocks that run at the bit rate, having a frequency of several GHz, which is far above what the logic fabric can handle. Also, the reference clock used to generate these GHz clocks must be very “clean” (low jitter), so the FPGA’s regular clock resources can’t be used. Frequency dividers inside the MGT generate the clock or clocks used to interface with the logic fabric.

In particular, the data words that are transferred from the logic fabric into the MGT for transmission, as well as data words from the MGT to the logic fabric (received data), are clocked by the outputs of these frequency dividers. The fact that these clocks are used in the interface with the logic fabric makes it possible to apply timing constraints on paths between the MGT’s internal logic and the logic fabric.

For the purpose of this discussion, let’s forget about the clocks inside the MGT, and focus only on those accessible by the logic fabric. It’s already clear that there are two clocks involved, one generated from the local oscillator, based upon the local reference clock (“local” clock), and the CDR clock, which is derived from the arriving data stream. Two clocks, two clock domains.

Clock or clocks used for implementing the protocol

As there are two clocks involved, the question is which clock is used by the logic that processes the data. This is the logic that implements the protocol. The answer is obviously one of the two clocks supplied by the MGT. It’s quite pointless to implement the protocol in a foreign clock domain.

In principle, the logic (in the logic fabric) implementing the protocol could be clocked by both clocks, however the vast majority is usually clocked only by one of them: It’s difficult to implement a protocol across two clock domains, so even if both clocks are used, the actual protocol implementation is always clocked by one of the clocks, and the other clock is used by a minimal amount of logic.

In all practical implementations, the protocol is implemented on the local clock’s domain (the clock used for transmission). The choice is almost obvious: Given that one needs to choose one of the two clocks, the choice is naturally inclined towards the local clock, which is always present and always stable.

The logic running on the CDR clock usually does some minimal processing on the arriving data, and then pushes it into the local clock domain. And this brings us naturally to the next topic.

Crossing clock domains

Every FPGA engineer knows (or should know) that a dual-clock FIFO is the first solution to consider when a clock domain crossing is required. And indeed, this is the most common solution for crossing the clock domain from the CDR clock towards the local clock. It’s the natural choice when the only need is to hand over the arriving data to the local clock domain.

Therefore, several protocol implementations are clocked only by the local clock, and only this clock is exposed by the MGT. The dual-clock FIFO is implemented inside the MGT, and is usually called an “elastic buffer”. This way, all interaction with the MGT is done in one clock domain, which simplifies the implementation.

It’s also possible to implement the protocol with both clocks, and perform the clock domain crossing in the logic fabric, most likely with the help of a FIFO IP provided by the FPGA tools.

To reiterate, it boils down to two options:

  • Doing the clock domain crossing inside the MGT with an “elastic buffer”, and clock the logic fabric only with the local clock.
  • Using both clocks in the logic fabric, and accordingly do the clock domain crossing in the logic fabric.

Preventing overflow / underflow

As mentioned earlier, the two clocks usually have almost the same frequency, with a difference that results from the oscillators’ frequency tolerance. To illustrate the problem, let’s take an example with a bidirectional link of 1 Gbit/s, and the clock oscillators have a tolerance of 10 ppm each, which is considered pretty good. If the transmitter’s clock frequency is 10 ppm above, and the receiver’s frequency is 10 ppm below, there is a 20 ppm difference in the 1 Gbit/s data rate. In other words, the receiver gets 20,000 bits more than it can handle every second: No matter which of the two options mentioned above for clock domain crossing is chosen, there’s a FIFO whose write clock runs 20 ppm faster than the read clock. And soon enough, it overflows.

It can also be the other way around: If the write clock is slower than the read clock, this FIFO becomes empty every now and then. This scenario needs to be addressed as well.

There are several solutions to this problem, and they all boil down to that the transmitter pauses the flow of application data with regular intervals, and inserts some kind of stuffing inbetween to indicate these pauses. There is no possibility to stop the physical data stream, only to send data words that are discarded by the receiver instead of ending up in the FIFO. Recall that the protocol is almost always clocked by the local clock, which is the clock reading from the FIFO. So for example, just inserting some idle time between transmitted packets is not a solution in the vast majority of cases: The packets’ boundaries are detected by the logic that reads from the FIFO, not on the side writing to it. Hence most protocols resort to much simpler ways to mark these pauses.

The most famous mechanism is called skip ordered sets, or skip symbols. It’s the common choice when 8b/10b encoding is used. It takes advantage of the fact mentioned above, that when 8b/10b is used, it’s possible to send K-symbols that are distinguishable from the regular data flow. For example, a SuperSpeed USB transmitter emits two K28.1 symbols with regular intervals. The logic before the FIFO at the receiver discards K28.1 symbols rather than writing them into the FIFO.

It’s also common that the logic reading from the FIFO injects K28.1 symbols when the FIFO is empty. This allows a continuous stream of data towards the protocol logic, even if the local clock is faster than the CDR clock. It’s then up to the protocol logic to discard K28.1 symbols.

There are of course other solutions, in particular when 8b/10b isn’t used. The main point is however that the transmitting side can’t just transmit data continuously. At the very least, there must be some kind of pauses. And as already said, when there are pauses, there are packets between them, even if they don’t have headers and CRCs.

But why not transmit with the CDR clock?

This can sound like an appealing solution, and it’s possible at least in theory: Let one side (“master”) transmit data based upon its local clock, just as described above, and let the other side (“slave”) transmit data based upon the CDR clock. In other words, the slave’s transmission clock follows the master’s clock, so they have exactly the same frequency.

First, why it’s a bad idea to use the CDR clock directly for transmission: Jitter. I’ve already used the word jitter above, but now it deserves an explanation: In theory, a clock signal has a fixed time period between each transition. In practice, the time between each such transition varies randomly. It’s a slight variation, but it can have a devastating effect on the data link’s reliability: As each clock transitions sets the time point at which a new bit is presented on the physical link, by virtue of changing the voltage between the wires, a randomness of the timing has an effect similar to adding noise.

This is why MGTs should always be driven by “clean” reference clocks, meaning oscillators that are a bit more expensive, a bit more carefully placed on the PCB, and have been designed with focus on low jitter.

So what happens if the slave side uses the CDR clock to transmit data? Well, the transmitter’s clock already has a certain amount of jitter, which is the result of the reference clock’s own jitter, plus the jitter added by the PLL that created bit-rate clock from it. The CDR creates a clock based upon the arriving data stream, which usually adds a lot of jitter. That too has the same effect as adding noise to its input, because the receiver samples the analog signal using the CDR clock. However, this effect is inevitable. In order to mitigate this effect, the PLL that generates the CDR clock is often tuned to produce as little jitter as possible, while still being able to lock on the master’s frequency.

As the CDR clock has a relatively high jitter due to how it’s created, using it directly to transmit data is equivalent to adding noise to the physical channel, and is therefore a bad idea.

It’s however possible to take a divided version of the CDR clock (most likely the CDR clock as it appears on the MGT’s output port) and drive one of the FPGA’s output pins with it. That output goes to a “jitter cleaner” component on the PCB, which returns the same clock, but with much less jitter. And the latter clock can then be used as a reference clock to transmit data.

I’ve never heard of anyone attempting the trick with a “jitter cleaner”, let alone tried this myself. I suppose a few skip symbols are much easier than playing around with clocks.

But if the link is unidirectional?

If there’s a physical data link only in one direction, the CDR clock can be used on the receiving side to clock the protocol logic without any direct penalty. But it’s still a foreign clock. The MGT at the receiving side still needs a local reference clock in order to lock the CDR on the arriving data stream.

And as things usually turn around, the same local reference clock becomes the reference for all logic on the FPGA. So using the local clock for receiving data often saves a clock domain crossing between the protocol logic and the rest of it. It becomes a question of where the clock domain crossing occurs.

Conclusion

If data is transmitted through an MGT, it will most likely end up divided into packets. At least one of the reasons mentioned above will apply.

It’s possible to avoid the encapsulation, stripping, multiplexing and error checking of packets by using Xillyp2p. Unlike other protocol cores, this IP core takes care of these tasks, and presents the application logic with error-free and continuous application data channels. The packet-related tasks aren’t avoided, but rather taken care of by the IP core instead of the application logic.

This is comparable with using raw Ethernet frames vs TCP/IP: There is no way around using packets for getting information across a network. Choosing raw Ethernet frames requires the application to chop up the data into frames and ensure that they arrive correctly. If TCP/IP is chosen, all this is done and taken care of.

One way or another, there will be packets on wire.

Messy jots on AppSheets

Introduction

These are the jots I wrote down as I learned my way through AppSheet. So this a messy and not so coherent post. Info is scattered here, not necessarily in a sensible order. This is not a tutorial.

My interest comes from the fact that I’m looking for an apartment, and I want to go through a lot of details in each apartment I visit. So I looked for a tool that allows me to organize the information I collect, from the ad, through the thoughts I have about the apartment before visiting it, and most importantly, to efficiently examine the apartment.

This can be done with a plain paper form, of course, but when there are many details and stages, it’s easy to miss one. So the crucial part of my application is to get a quick final view that ensures me I’ve done everything, just before thanking for the visit. This is the most important part, and it’s also the most difficult to implement.

In hindsight, it took 9 days, which is much more time and effort than I intended and wanted to invest on this. I had the “almost done” feeling more or less from the beginning, but every next step turned out much more difficult (and annoying) than I imagined. AppSheet makes you feel like a code monkey that doesn’t know coding. And if AppSheet is used for a long-term project that is maintained over time, be sure to have a QA team that tests everything in every possible combination. The framework invites silly bugs everywhere.

General

  • In a nutshell, AppSheet takes data from a database (preferably represented as a spreadsheet in Google Sheets) and presents this data in different Views, allowing both displaying and changing the data. The other side of AppSheet is called Automation, its ability to generate files, send emails / SMSes, submit HTTP requests etc. with the information in the database, based upon a template. This can be a pdf for an invoice, or a summary report of an inspection. Or an XML or JSON file, pushed to a web server to initiate some action on the information. So Views are primarily intended to feed data into the system, and Automation covers the consumption of the data for a useful purpose.
  • AppSheet at no cost is unlimited in time and covers every feature of the platform with up to 10 test users. One user, myself, is enough. In free mode, the app can is limited to Prototype status, which has no significance when used only by myself.
  • AppSheet reminds of Drupal in many ways, and it’s not necessarily a good thing. Lots of click-click-click around, everything is graphical, and one ends up feeling like a machine doing a silly task a gazillion times because it’s the same as configuring a Windows Server. Every. Single. Detail. Must. Be. Configured. Manually. With a prayer that there will never be a need to change all those settings that have been repeatedly configured. Plus, small bugs creep in everywhere, as there are a whole lot of details that one must keep track on when making every little change.
  • Another thing common with Drupal is that every little task takes a lot of time, and sometimes the answer is simply “no, that’s impossible”. At least with Drupal, one can hack the PHP code. For example, the one thing that I considered important, but didn’t implement, was to get a list of the items in a checklist that I didn’t check. I didn’t want to implement the checklist as individual Yes/No (boolean) columns, because adding an item would require modifying the table, regenerating the schema, editing the View and then the report. So I did that with an EnumList, but the only information it emits is what was checked. So all in all, there is no way to implement this rather important feature without turning the checklist into a nightmare to maintain in the long run.
  • There are certain limitations on databases and tables, in particular 1000 rows per database in the Free plan and for all plans there are 20 tables per database and 100 columns per table. Don’t worry, you’ll lose your sanity before getting even near these limits.
  • Press the “Save” button to the upper right to make changes visible (e.g. when checking with mobile app). This also clears existing warnings, whether they were tended to or not.
  • Offline use works great: No problem using the mobile phone even with no Intenet coverage. Update to cloud occurs later (may be required to open the app for that). Actually, the only reason I use AppSheets instead of a JavaScript home-cooked application is the ability to work offline.
  • Backing up an app: Impossible. There’s no way to download the app’s own settings, and deleting an app accidentally is easy. The best solution I found for this is to share the app (or better, a copy of it) with another user (being yourself) and make a copy of the app as the other user (note to self: It’s the “obscura” account). And never log in as the other user again. This way, the copy at the other user is safe from accidental tampering. I speculate that the reason for not being able to download the app itself is that AppSheet’s business model is pay-per-user, and it’s free for less than 10 users. So had it been possible to download and restore the app from a file, a lot of no-cost single-user accounts would have been created and deployed.

Random insights

  • Each time “Save” is clicked, a new version is saved. It’s possible to go back to older versions. Version history is saved 7 days back by default.
  • It’s possible to run a preview of the app by right-clicking the three dots to the right of the app’s description in the list of apps. Doing that, the specific version of the app is previewed, ignoring subsequent new saved versions of the app. There’s a misleading “refresh” icon on the app preview. It probably only updates data, not the app itself.
  • If a “Map” view is generated automatically, delete it. Otherwise it will show up as the default view for everything (push marketing, it’s called), and it’s as annoying as it’s pointless.

Sources of info

Creating a new app

This describes how to create an app based upon a spreadsheet. The spreadsheet’s first row consists of the description of each column, and the following rows contain data. Multiple sheets are treated as separate tables, each sheet’s name being the AppSheet table name (so don’t rename the sheets in the spreadsheet).

Upload the .ods or .xlsx file to Google Drive, or start a new Google Sheet.

if you’re already registered with AppSheet, right-click the file in Google Drive, pick Open With and choose AppSheet. An app is generated automatically from the data in the file. The name of the app will be the same as the originating spreadsheet’s file name. The column types (text, number, price etc.) are automatically assigned, but can be changed later on in the Data section. It’s even possible to change the type to Image, in which case pictures that are uploaded or taken with the mobile phone are stored in a subdirectory to where the Google Sheet item is saved with a name like appname_Images/. The value in the spreadsheet is the path to the image file, e.g. appname_Images/J1.Silly.140258.jpg.

It’s best to delete the .ods or .xlsx from Google drive at this stage to avoid confusion: A Google Sheets entry is created in the same directory. Changes made with AppSheet go to this entry, not to the file used to create the app.

Doing the same from within AppSheet (not clear if this is better):

  • Navigate to the page listing apps (AppSheet’s main page), click “+ Create”, pick “App” and then “Start with existing data”.
  • Give the app a name. For category, I choose “Inspection & Surveys”, and then click “Choose your data”.
  • Select an existing Google Sheets entry in your Google Drive account as the database (Google Sheets).
  • Trying to use an .ods file as the data source caused AppSheet to hang (wait indefinitely to complete creating the app).

It’s also possible to use an AppSheet database for this purpose, but that seems to be a leftover from the days before AppSheet became part of Google. There’s no reason I’m aware of to prefer AppSheet database, and there’s no obvious way to make a backup of it, nor to restore. Possible with exporting to CSV with Automation, I suppose, but never tried it.

A directory named “appsheet” is created in Google Drives’ home directory. It contains a directory structure with an empty file called “empty.txt”. A subdirectory is created for each AppSheet app, and files generated by the App go into that subdirectory. It’s possible to select a different target for output files, however (see below).

When an app is created, you get an email with an invitation to use it or develop it.

To do after a while

After playing around with an app for a while, it’s a good idea to make a few fixes. Go to the Views section (click on the third icon from the top) and then on the settings icon (a flywheel).

  • In Views > General, choose the starting View.
  • Also in Views > General, set the Image Upload size to the desired level (High for 1600x900 with my phone, of Full).
  • Be sure to have enabled “Allow drawing on images” on all Image columns (unless you don’t like that option).
  • In Offline mode, enable “Store content for offline use”.
  • Go do the Data view, choose the main table, and opt out deleting rows (the flywheel icon to the top right for a table’s settings). Delete rows directly on the spreadsheet if needed.
  • Change the Default app folder to /theprojectname/appsheet-data or something like that, which should be a valid path in Google Drive. So that the files land somewhere sensible. This is done in the Setting section (flywheel icon to the left), under “Information”.

Views

  • The View displays (or allows editing) a subset of column values of a row of a table. In some cases, several rows are listed in one View. Which columns are displayed, how they are displayed, and if they are editable, is what the View’s configuration is about. But no matter how you twist and turn it, a View shows values of columns (possibly along with the column names). If you want to display anything, turn it into a column (possibly a Virtual column, defined only inside AppSheet, see below).
  • The relation between the AppSheet’s table, which is used in Views and expressions, is not 1:1 with the table in the spreadsheet: Virtual columns are added, either by AppSheet itself (e.g. _RowNumber and Row ID), or by the user. In the latter case, the value of the Virtual Column is an expression that the user supplies. This can be a constant, for example if the purpose is to create a menu button in a View. The expression can also be a formula, which depends on the value of other columns of the same or other tables. In these expressions, [X] is the value of column X of the same row.
  • Clicking / tapping on an displayed item on a View initiates an Action, which often consists of opening a view for a specific row in a table for read-only access or editing. These are system-generated view with names like “main_detail”, “main_form” and “main_inline” for a table named “main”. These Views can be modified like any other view. And a whole lot of other actions can be added and used as the response.
  • It’s often difficult to figure out which View appears in response to clicks, as it’s often a system-generated one. Be sure to have the Edit option on (top-right on the development screen). Hover over the relevant area, wait for the pencil icon to appear, click it and pick e.g. “Edit View”.
  • If a system-generated view is deleted, it’s created anew after saving, apparently with default settings.
  • If a column is added to the spreadsheet’s table, it’s not available for use in the AppSheet immediately. The AppSheet’s table schema needs to be regenerated for this to happen (click the Data icon, click the table, three dots, and then “Regenerate Schema”). Virtual columns are not deleted, despite the scary warning one gets.
  • When clicking / tapping an item can result in an detailed view or the possibility to edit, these are system-generated views, that appear at the bottom left in the Views subsection. These Views can be modified.
  • Views listed under “Primary Navigation” are accessible directly at the bottom part of the screen. Those under “Menu Navigation” are accessible through the menu. “Reference Views” are invisible to the user, except for when requested from other views, for example in an Action (i.e. a response to clicking / tapping an item).
  • There are Format Rules allowing to change colors of items etc depending on boolean expressions. In most cases, they apply to columns and how they are displayed (with a different color, or with a colored icon added). Unfortunately, it’s impossible (as far as I know) to write catch-all rules for several columns, as the expression for activating the rule doesn’t play ball with the [_THIS] expression, which means “this column”. So for example, if you want a rule that marks unfilled columns with a red dot, add (or duplicate) one rule for each and every column, and be sure that the rule for column X doesn’t by mistake change the display format of column Y. It’s just an invitation to make silly bugs.
  • The column’s name is displayed above the column’s value in forms and detail Views by default. This can be changed by opening the column’s settings in the Data section, and change the “Display name”. So use concise names in the spreadsheet.
  • A column of type Show is useful for text, URLs, images etc, that appear in forms for instructions, setting page breaks etc. These are best added as virtual columns with literal values in the expression. Don’t forget to turn off the “Editable” attribute of this column. It’s quite unfortunate that instruction text appears as an extra column along with the spreadsheet’s data, but that’s the way it’s done.
  • To navigate to another view as a result of clicking on a column (possibly a Virtual Column acting as a menu entry):
    • First, an Action needs to be created. Click on the Action icon (looks like electric zap), click on the “+” at the top left, and add an action
    • Pick the table related to the view that this action shall work on, and pick “App: go to another view within this app”.
    • As for the Target, choose the expression LINKTOVIEW([targetview]) (given that “targetview” is the name of the column containing the name of the view to navigate to, as a plain text column).
    • Go to the relevant item in the View, and choose the newly created action under the Behavior section.
    • If the item is a Virtual Column just for the purpose of being a menu item, change its Display Name to ” ” (a normal white space inside quotes) so that the column’s name doesn’t appear on the menu, which is both ugly and consumes space. Choosing an empty string, “”, has no effect.

There are several types of Views, but the most important for a plain spreadsheet-based app are:

  • Form: All columns listed in the “Column order” are shown and editable. The title of each input (text box, drop-down menu etc.) is the name of the column in the database or spreadsheet, unless overridden in the column’s configuration (in the Data section). Unfortunately, a back-swipe on a mobile phone is interpreted as “Cancel”, and there’s no way around this. One must explicitly tap on Save, or else all is info is lost.
  • Details: Like Form, but read-only, with an icon at the bottom right for editing the content, which switches to a Form with the same set of fields. Unfortunately, blank fields are not shown unless their Show property is set to the expression TRUE (just checking the checkbox isn’t good enough, it has to be done with an expression).
  • Table: As the name implies, like Details, but with the data shown as a table (with rows being rows, columns being columns). Selecting a row brings to a Details view of it. It’s possible to configure which columns appear in the table, possibly only one. So this can be a concise way to display a list of rows. It appears like the column marked as Label won’t appear in a Table View even if chosen to do so. Why is unclear.
  • Card and Deck: Each row gets a small pane with two or a few selected column values shown. Selecting one brings to a Details view.
  • Gallery: Shows the image associated with each row (if such exists) and the value of the column marked as Label.
  • Dashboard: A View containing other Views.

Adding a new table

  • In Appsheet, click the Data icon (the second icon from the top).
  • Click on the “+” (Add new data)
  • Navigate to the relevant Google Sheet.
  • Navigate to the related table, and add it.

Settings for each column in a table

In the Data section, there are several attributes one should look at in a table’s configuration for a table:

  • Name: Must match the name in the spreadsheet’s header (or the column name of a database’s table). The app fails to load otherwise.
  • Type: Number, text, image, reference, price, there are a lot of options. The automatic choice is usually correct for the simple textual columns. For more sophisticated input (drop-down menus, images, references etc.) this is the place to configure that.
  • Key: This checkbox is checked for one column only, selecting the key column in the database sense. Relevant in particular when using references.
  • Label: This checkbox is checked for one column only (plus, possibly, an image), selecting which column appears in several views as the row’s main description.
  • Formula: Left blank for columns taken from databases, must contain something for a Virtual Column. When set to something, the value of the column is the expression in this formula.
  • SHOW?: As its name implies, it controls if the column should be shown in views. This checkbox should usually be checked: If a column isn’t desired in a view, it should be removed from there. If the column should be displayed in Details Views even when blank, checking this isn’t enough: The Show propery must be set to TRUE as an expression.
  • EDITABLE?: Can the value of the column change?
  • REQUIRE?: Must the column have a value to allow finishing an edit session containing this column?
  • Initial value: The column’s initial value when a new row is created.
  • Display name: If left blank, the Name field from above appears above the value of this column in forms etc. Display name overrides this otherwise.
  • Description: For internal documentation, and is also used instead of the column name if “Display Name” is left blank.
  • Search: Is this column involved in searches? This is a tricky one: Enabling this means that the row will appear in searches even if the column isn’t displayed in the View, and vice versa: Even if a column is displayed, the search doesn’t take it into consideration unless the Search option is enabled for the column. It would, of course, had made more sense to define this option per View, but that’s not the way it works, unfortunately.
  • Scan, NFC, PII: Can this column accept data from these input methods?(scan means barcode scanning)

Modifying the spreadsheet

It’s possible to modify the spreadsheet even after the app has been created, including adding a column in the middle. The immediate effect is an error (“unable to fetch app definition” on the mobile app, or just an error on the development console). To resolve this, pick the “Data” icon (second from the top”) on the development console, and then click the round arrow (as in “reload”) to regenerate the structure. A scary warning pops up in response to that, but just go ahead.

Note that the connection between AppSheet and the spreadsheet is through the names of the columns as given in the first row, as well as the name of the sheet. These should not be modified (unless the manipulation is intentional).

References etc.

AppSheet definitely supports relational databases. In fact, by choosing names of columns where one is the plural form of the other (“owner” column and “owners” table) the relation is possibly set up automatically. This is however not necessarily a good idea, because the tools choose which column’s value identifies the entire row — and getting this wrong could mess up things later, and it’s difficult to fix it afterwards.

Rather, if a spreadsheet (or virtual) column (in “table A”) is assigned the type “Ref”, it’s also required to assign the table referred to (“table B”). The value of the column selects the row that has a key column equal to it. In the View showing this column, the label column of the referred table is shown. Clicking / tapping on the label leads to a View of that row: its columns’ values as well as other rows referencing it. In order to access a specific column, add a virtual column to Table A with a formula like [the_refcol_in_table_A].[the_col_in_table_B]. It’s not as difficult as is sounds, as the GUI starts suggesting completions when it sees the brackets and the dot.

Note that in Appsheet, a reference actually means the whole row. In many practical situations, only the label column is displayed, but it’s important to remember that conceptually, a Ref column represents the entire row it refers to.

And this brings me to back-reference. When a table B is referenced by a column somewhere in table A, a virtual column is automatically added to table B. This column has the type List and its value is something like REF_ROWS(“table A”, “referring_column”). The arguments are the name of table A and the name of the column with type Ref in that table. So this column lists all rows in table A that refer to the row in table B. Consider deselecting its “SHOW” checkbox in the Data view, if this back and forth is undesired.

Also, note that the back-reference is shown as a list of inline elements on the View of Table B. In order to determine how it appears, configure the system-generated view with name e.g. tableA_Inline (for example, if it’s a table or a deck). Even more important, it allows choosing what happens when one of the elements is clicked: View details (e.g. tableA_Detail view), edit (e.g. tableA_Form) or do something completely different with an Action?

And when references exist, it’s important to get the keys right. It’s possible to choose anything that is guaranteed to be unique, but because references relate to the key, there might be weird side-effects if the value of the key column changes. For example, if we choose a person’s email address as the key, all references to that row are lost if this person changes this address.

What we really want is a solid reference to the row itself. This is best done by adding a dedicated column for that in the table (i.e. in the spreadsheet), and assign it with a unique value when a new row is created: Use UNIQUEID() as this column’s initial value. When AppSheet adds a new row, it will calculate a unique ID and put the value there. Make this column non-editable (possibly also turn off “SHOW” too). In fact, AppSheet does this automatically when I give a column the name “key_id”.

Note that calculating UNIQUEID() in a virtual column is pointless. That’s not a solution for creating a unique ID for a row.

Other insights:

  • Allowing upload of multiple images (or adding other items): Create a table consisting of “key_id” (used as key), “ref”, “image” and “weight” columns. The “ref” column is declared as a Ref type to the table in which the images will be stored. Enable “is part of”, which makes it possible to add new items from the main view. So all in all, there is a list of images in the main view, (of table B). Each row in the view points to a row in table A. This makes it possible to add an arbitrary number of items from a view of a row in the main view. Each item becomes a new row in table A, with a reference to table B. “weight” is a number, allowing to control the order when displaying the image. Using “key_id” as the key, and not “image” (which is the name to the file containing the image) makes it possible to show the same image in different Views, even from different tables.
  • Drop-down menus with text taken from somewhere else: Prepare a table (a sheet, and import it as a table) with keys as numbers and text (or anything else) as values. Then, on the table where we want the drop-down, set the column’s type to Ref, and point to the said tables as the reference. Select the input type to Buttons or Drop-down. In order to allow adding new possible values, set it to Enum (or EnumList for multiple values), and the Base type to Ref. Choose the said table as the target for the reference. Note however that in the latter method, all already existing keys must be added manually to the column’s definitions, exactly like any Enum type. The only difference is that the keys (possibly numbers) are fed manually, not the text. So if adding new possibilities is required, plain Enum is probably the best way.
  • It’s impossible to get a list of options not chosen in an EnumList. It’s possible to write an expression that reduces the chosen elements from a given list with those selected by an EnumList, but that requires keeping this expression and the EnumList’s value in sync. And if one forgets to update the expression after adding it to the EnumList, it’s a nasty bug.

One long row or relational database?

As there are a lot of pieces of information about an apartment, which can be divided into categories and will be handled by different Views, my database manager self was inclined towards a relational database approach. One row with a few columns for each apartment in the “main table”, and then several tables, one for each information category (i.e. View), referenced by the “main table”.

The alternative approach is one table for all information (except for images and other utilities) and to make one long row with all info. It’s up to the Views to select which columns to show: All columns in the original table (or spreadsheet sheet) don’t have to be visible. It’s possible to make several views of the same table. But this means a lot of columns to handle (add virtual columns to that), making it difficult to keep track of the whole thing.

I went for the one long row approach. The main reason was that this approach makes it easier to reorganize the Views if needed. For example, if fields are added over time, and a View becomes too crowded, it’s not a big deal to split it into two Views, if they both refer to the same table anyhow. Or to move a field to another View.

That said, the separate table table approach would definitely have worked: When a new item is added, that means a row in the main table. A new row is created on each separate table by virtue of a menu button (a virtual column with a specified Action). LINKTOFORM() allows opening a form with specific initial values, so the key column can be set in the child table to the parent table’s key column value, ensuring a 1:1 relationship (and prevents duplicate rows).

For reports, the child table is shown as a back-reference to the main tables with a Start/End pair of template tags (maybe there’s a simpler way to do this).

In hindsight, I should have taken the separate table approach, but in a way I didn’t think about at first: There should have been a table with one row for each room, regardless of its type. This would have reduced the number of columns in the main table, and the amount of work and headache is linear to the number of columns. The trick would have been not to show all columns of a room’s table in the form View, but only those suitable for the specific type of room. The rules for which types of rooms exist, and which columns should be shown in the View would be listed in a separate table. Hence the “Show if” rule would look up the column in this rule table, in relation to the room’s type.

Use slices?

The question is how to display a subset of the columns in different Views.

The seemingly natural (and less preferred) way is by using slices, which is a virtual table, calculated on the fly, consisting of a subset of rows and columns as configured. This allows displaying only a subset of rows, columns and actions, filtered with a condition on the data (or just not show all columns). Slices are configured with the “+” icon on each data set (inside the “Data” section) as they are considered some form of table. Alternatively, slices are hinted on the View’s configuration as well.

This would have been the preferred way, had it not been for an annoying caveat: If a new column is added to the database (i.e. sheet), it appears in all slices. This makes slices a no-go for the purpose of selecting columns for a view.

Instead, the columns to display should be selected in the “Column order” part, which can be found in the “View Options” section for each View configuration. Note that “Column order” isn’t present for Views that don’t display columns, e.g. Deck and Gallery.

It makes sense to start from generating an empty view, and copying it for each time a new view is created.

Creating a pdf file from the data of a row

…or for that matter, a CSV / JSON / XML / HTML / XLSX file.

The important point is that this is done with Automation, and that the execution is triggered by an Event, and not as an Action. In other words, clicking something on a View won’t trigger the generation of the report directly. Rather, clicking the entry in the View causes an Action that modifies a row in a table, or to add a row a table, and that, in turn, triggers the Event.

There is no direct connection between the Action and the task it requests — the generation of a file, in this case. It can take 20-30 seconds from the triggering event until the file is created.

One option to request the generation of a file from an app: First, add two columns to the table for which the report is desired: “do_report” and “last_report”. In AppSheet, regenerate the schema if necessary, and set the columns as follows:

  • do_report: Type is “Yes/No” (sweet talk for Boolean), turn off “show”, initial value “FALSE”
  • last_report: Type is “Text”, and set Initial Value to “Never”. “Editable” must remain enabled, even though it shouldn’t be accessible by forms: Its intention is to be set by the Bot, and this is possible only if the column is editable.
  • Add a virtual column named “generate_report”, “Text” type, with expression as follows:
    CONCATENATE("Generate report
    Last report: ", [last_report])

    Note that there is a new line in the middle of the string.

  • Create a new Action (Bzzt icon): For “Do this” pick “Data: set the values of some columns in this row”. Pick the “do_report” column and set the expression to TRUE. I’m going to use the Virtual Column defined above as a menu item, so for Position, pick “Inline” and at “Attach to column”, pick “generate_report”. In Behavior, set “Only if this condition is true” NOT([do_report]), and optionally also enable the request for confirmation.
  • Add the “generate_report” column to the View (or Views) from which the report should be requested.
  • Optionally (and recommended), add a Format rule for generate_report. The condition is [do_report] and if so, the text should be grey (color #888888) and in italics. For “Format these columns and actions” pick generate_report, and possibly also the action created just above. Without this, there is no immediate visual feedback to “pushing the button”.

Now to the part that actually generates the report.

  • Go to the Automation section (robot icon) and create a new Bot.
  • Click “Configure Event” and give it a name.
  • In the pane to the right, the Event source should be “App”. Pick the table from which the Event is triggered.
  • Select only “Updates” for Data change type.
  • Condition: [do_report]
  • Next, define what should be executed — the Process. Add Steps, which is more or less like a line in a computer program: Each of them can perform something, it can check a condition and branch, or wait for the condition to be true. Or call a process. These are steps to define (give each some name, as usual) and configure them on the pane to the right.
    • Type: “Run a data action”. In the right pane, pick “Set row values”, and set “do_report” to FALSE.
    • Type: “Run a task”. Set up the task as follows in the right pane:
      • Pick Create a new file
      • HTTP Content Type: pdf
      • Template: Click Create (or pick an existing template, if you have one from previous sessions). A “View” button appears after this is completed. More on templates in a separate section below.
      • Optional: Change File Folder Path to “/” (with the quotes), File Name Prefix to “report” (once again, with the quotes) and opt-in Disable Timestamp. This way, report.pdf is created at the project’s root directory (not Google Drive’s root directory in most cases). Expressions can be used for these, of course.
    • Type: “Run a data action”. In the right pane, pick “Set row values”, and set last_report to TEXT(NOW(), “DD/MM/YYYY HH:MM:SS”) for British time format.

Note that it’s possible to create a pdf file, but the template file is an HTML file. This makes it much easier to handle the template, and it’s always possible to go back to the HTML file to see what went wrong. The disadvantage with an HTML template is that the file generation fails on the tiniest HTML syntax error. For example, a missing </tr> tag. Or even more ridiculous, a missing </p> tag.

In order to view the generated file by clicking a menu item in a View, create a Virtual Row for this purpose, and assign it with an Action when selected. This action should be “External: Open a file”. The “File” entry is a URL to where the file is stored on Google drive. Obtain this link with the Share feature, so the expression becomes something like “https://drive.google.com/file/d/1d8AWHjLJdK9cyhYGBPnuVJJ0asmLmZVw/view?usp=sharing” (the quotes included). This is not a security risk as one chooses to share only with oneself (when obtaining the link from Google Drive).

If the execution fails due to a malformed template file (something inside the markups didn’t work fine), the app’s View doesn’t always get updated until a restart of the app (or reload the entire page on a web browser).

Actually, the way the whole thing behaves is that the menu button gets greyed out immediately after being pressed, and then gets back to normal after the file has been generated. But the color is determined by do_report’s status, which is turned back to FALSE before the attempt to generate a file. So if the file generation takes time, it may return to normal before the file is updated — but this has never had any practical consequence for me. But why doesn’t it go back to normal when the file generation fails? do_report is FALSE either way.

To view error messages: Go to the “Monitor” section (icon at the left side, one from the bottom), and pick “Launch Log Analyzer” in the “Audit History” part. After some graphs, there’s a table with the last tasks initiated. Where there is an error, click the binocular icon, for details. A JSON text entry appears. The “Errors” key is where the error message is shown. It can take a few minutes before the entry with the error is shown, and before than, there may be entries showing success even if an error has occurred. So wait a few minutes before ruling out a failed run.

Templates

A template is a file in Doc or HTML format. Everything is passed through to the output document, except for markups with the <<expression>> form (or, for an HTML template, &lt;&lt;expression&gt;&gt;. The expression is evaluated exactly in the same way as the value of a Virtual Column, and the markup is substituted with the expression’s result. Hence to obtain a substitution with the value of a column named “thecol”, just insert <<[thecol]>>.

No matter if the output is pdf or HTML, when using HTML templates, be sure that the HTML file has all the classic wrapper tags: <html>, <head>, <body> etc. or else AppSheet ignores all <<something>> (or actually, &lt;&lt;something&gt;&gt;) markups altogether and the output is the template file itself.

There are special kinds of markups:

  • Start/End for looping on a list of keys to a table
  • If/EndIf markups, allowing a segment to be included or not. Apparently not available with HTML templates, as it’s not listed explicitly, and I failed to use it (so I used the IF() function instead).

The automatically created template is simple and not necessarily very helpful: This template encompasses all real columns (but not the virtual ones), showing the column’s name (not its display name) and their plain values. Its most valuable part is where it shows the use of Start/End (when relevant, i.e. when there are back-references to the displayed table). Note that as shown in the template, the Start and End tags must be inside a <p> block. They can’t be just a floating piece of code, or within another couple of tags (<div> doesn’t work, for example). Otherwise one gets a misleading error message saying the table with name “Start:” can’t be found + a lot of error messages for everything between the two tags.

The template is annoying in particular in relation to Ref types: The actual value, which is the key itself, is shown for references, and not the referenced value, as seen on the app’s Views. It would have been sweet of AppSheet to look up which of the referenced table’s columns is displayed in the View, and show that. But there’s another reason for this: Say, that the Ref column is “choice” and the column to display on the referenced table is “label”. The expression for the value to display is [choice].[label]. But if “choice” happens to be blank, and this expression appears anywhere in the template, the file generation fails. So be sure to assign an initial value to all Ref columns when creating a new row. Plus, possibly make a simple sanity check on the column, just in case:

<<IF(ISBLANK([choice]), "Error!", [choice].[label])>>

Not sure if this is bullet-proof, but it solves the most obvious problem.

I’ve also got the impression that if the result of the expression of an AppSheet markup expression returns anything containing HTML markups (e.g. <b> and </b>) the file generation fails completely.

Notes to self

These are cryptic notes to myself, for my own apps’ housekeeping tasks.

First of all, the account I’ve shared stuff with is “obscura”.

Adding/removing a new item to a checklist

  • Update the related checkcnt expression with the new number of items, so that the report says all is done when it really is.

Adding a new column

  • Be sure to regenerate the schema of the related table in AppSheet.
  • Possibly add (or duplicate) an “Unfilled X” Format rule (e.g. ISBLANK([streetaddress]) ), if it matters that the field is empty. Mark with Cyan dot, and don’t forget to set the affected column correctly.
  • Set the SHOW property of the column to the expression TRUE if it’s supposed to appear in the detailed view, even if it’s blank. Just clicking on the “SHOW” checkbox isn’t good enough.
  • Set the Display property to something friendly

The striked-out bullets were relevant when I thought about using Detail view as the summary of all. I used a pdf report now, so they’re not relevant.

Adding a new View Form

  • There should be a status_* column in relation to this in the spreadsheet. Its setting inside AppSheet: Ref to const_group_status with Buttons (not “part of”), enable Show, Editable, Require, Initial Value is “0″ (with quotes), Display name is “Status”. The four checkboxes to the right are left unchecked.
  • In the main table, add a Virtual Column with the same name as the Form View it relates to, e.g. view_base for status_base (so it’s easier to match between them later). Its type is Text, and its value is the relevant menu item’s description.
  • Only now, create a new Form view. In Column order, pick Manual, select all, unselect those not required. Be sure to keep the relevant status_* column.
  • Go to the “Actions” section (Bzzzt icon) and add a new action (actually, duplicate a similar one). Name it e.g. “Jump to base form”. For a record of “main”, Do this is “App: go to another view within this app”. Set Target to LINKTOROW([key_id], “view_base”) where “view_base” is the name of the View. Set Position to Inline, and Attach to column as the related Virtual Column. Pick a suitable icon too. Unfortunately, there is no catch-all action for navigating.
  • Duplicate a couple of “Mark incomplete” format rules, and adapt them to the new View: Change the expression to match the correct status_* column, and also the “Format these columns” to the relevant “view_*” with the same name. And test it. There’s no way around a couple of rules for each View.

Un-ignore /usr/lib/systemd/ in .gitignore with git repo on root filesystem

Actually, this is about un-ignoring any subdirectory that is grandchild to an ignored directory.

Running Linux Mint 22.2 (based upon Ubuntu 24.04), and having a git repository on root filesystem to keep track of the computer’s configuration, the vast majority of directories are ignored. One of the is /lib, however /lib/systemd/ should not be ignored, as it contains crucial files for the system’s configuration.

On other distributions, the relevant part in .gitignore usually goes:

[ ... ]
bin/
boot/
dev/
home/
lib/*
!lib/systemd/
lib64/
lib32/
libx32/
lost+found/
media/
mnt/
opt/
proc/
root/
run/
sbin/
[ ... ]

So lib/ isn’t ignored as a directory, but all its content, including subdirectories is. That allows for un-ignoring lib/systemd/ on the following row. That’s why lib/ isn’t ignore-listed like the other ones.

But on Linux Mint 22.2, /lib is a symbolic link to /usr/lib. And since git treats a symbolic link just like a file, /lib/systemd/ is treated as /usr/lib/systemd. Ignoring /lib as a directory has no effect, and un-ignoring /lib/systemd has no effect, because to git, this directory doesn’t even exist.

So go

$ man gitignore

and try to figure out what to do. It’s quite difficult actually, but it boils down to this:

usr/*
!usr/lib/
usr/lib/*
!usr/lib/systemd/

It’s a bit tangled, but the point is that /usr/lib is un-ignored, then all its files are ignored, and then /usr/lib/systemd is un-ignored.

The only good part about this solution is that it works.

Footnote and whole-page layout with wkhtmltopdf

This HTML code makes wkhtmltopdf create a single page with a footnote. If the external <div> is duplicated, separate pages are generated.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<div style="height: 1350px; display: flex; flex-direction: column; break-inside: avoid; border:1px solid #668;">
This is just a test.
<div style="margin-top: auto;">
This is a footnote
</div>
</div>
</body>
</html>

So how does it work? The important part is the “style” attribute of the outer <div> tag:

  • height: 1350px: This sets the <div> block’s height to a full A4 page. Why 1350 pixels? I don’t know. I just tweaked with this figure until it got right. It’s possible another figure is needed on a different version of wkhtmltopdf. I’ve tried to set this with cm as well as pt units, but none corresponded to the standard figures for an A4 page. So I went with pixels, which clarifies that it’s a wild guess.
  • display: flex; flex-direction: column: This turns this <div> block into a Flexbox container, with vertical packing. This is needed to push the footnote’s block to the bottom.
  • break-inside: avoid: This tells wkhtmltopdf to avoid page breaks in the middle of the block. This makes no difference for a single page, but if this <div> block is repeated, this style attribute ensures that each block gets a separate page (unless any of the pages exceeds a page’s height).
  • border:1px solid #668: This generates a border around the <div> block’s occupied area. Used only for finding the correct height attribute, and should should be removed afterwards (unless this border is desired on every page).

The footnote is pushed to the bottom of the page by virtue of the margin-top: auto style attribute and the fact that the <div> block having this attribute is within a vertical packed Flexbox container.

Notes:

  • This was done with wkhtmltopdf 0.12.4, without the “wkhtmltopdf patches” according to the man page.
  • If the height is too large on any page, all break-inside are ignored. In other words, the whole pdf document gets garbled, not just around the page that went wrong.
  • I tried changing the resolution on my X11 display, and it didn’t make any difference. This might sound like a silly thing to check, but wkhtmltopdf depends on the X11 server.

Notes on installing Linux Mint 22.2 (full disk encryption)

Introduction

These are my notes to self for the next time I install a Linux system. As if I read my previous posts before attempting.

So I installed Linux Mint 22.2 (kernel 6.14.0-29) on a Lenovo V14 G4 IRU laptop. With Cinnamon, of course, not that it’s relevant.

All that I wanted was a full-disk encryption, but being allowed to choose the setup of the partitions explicitly, and not let the installation wizard make the choices for me. In particular, I wanted a swap partition with the size I choose, and even more important: Almost all disk space in a /storage mount, so that one can fill the hard disk with junk without risking a system failure because root partition is full.

Cutting the cake

/boot/efi is where the BIOS reads from. It’s natural to put it as the first partition, and it can be very small (even 1 MB can be enough in some cases, but let’s not push it like I eventually did). But if you make it really small, it’s a FAT16 partition, not FAT32, and that’s OK. Don’t force it into FAT32, because the system won’t boot with it if it has less clusters than required.

So it goes like this:

  • Create three partitions:
    • First one for /boot/efi (e.g. nvme0n1p1), 10 MB. This must be FAT32 or FAT16 (the latter for 10MB). Note that 10MB is a bit too small, because the BIOS won’t have room for its own backup this way.
    • Second one for /boot, will contain initramfs images, so ~500 MB. Any filesystem that GRUB can read (so ext4 is definitely OK)
    • Third partition for LUKS
  • In the LUKS partition, create an LVM with partitions for / (100G) and swap. The rest is for /storage.

It’s somewhat confusing that /boot/efi is a subdirectory of /boot, but that’s the way it is.

Running the installation wizard

  • Unlock the encrypted partition, if it’s not already (e.g. with the “Disks” GUI utility). This requires giving the passphrase, of course.
  • Double-click the “Install Linux Mint” icon on the desktop.
  • When reaching the “Installation type”, pick “Something else”.
  • Set the following mount points:
    • Set the small FAT partition (nvme0n1p1 in my case) as “EFI System Partition”
    • /boot on the partition allocated for that (non-encrypted partition, possibly ext4).
    • / on the relevant LVM partition inside the encrypted block
    • Set the swap partition
  • Set the “Device for boot loader installation” to the one allocated for “EFI System Partition” (nvme0n1p1) in my case. One may wonder why this isn’t done automatically. Note that it’s the first partition (/dev/nvme0n1p1) and not the entire disk (/dev/nvme0n1).
  • Don’t do anything with the planned /storage partition. As I don’t want to assign it with a mounting point, handle it after the installation is done.

If the installation ends with a failure to install GRUB, run “journalctl” on a terminal window and look for error messages from the grub installer. Don’t ask ChatGPT to help you with solving any issues, and don’t ask me why I know it’s a bad idea.

When I insisted on FAT32

Sometimes I’m too much of a control freak, and when the Disks utility formatted the EFI partition into FAT16, I thought, oh no, it should be FAT32, what if the BIOS won’t play ball?

Well, that was silly of me, and also silly to ignore the warning about a FAT32 filesystem with just 10 MB having too few clusters.

So even though the installer wizard finished successfully, there was no option to boot from the disk. Secure boot was disabled, of course. And yet, there was no suitable option in the BIOS’ boot menu. There was a “UEFI” option there, which is always in black (not possible to select), but that doesn’t seem to be relevant.

Following the warm advice of ChatGPT, I added an entry while in Live USB mode:

# efibootmgr -c -d /dev/nvme0n1p1 -p 1 -L "Linux Mint" -l '\EFI\ubuntu\grubx64.efi' 
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC
Boot0003* Linux Mint	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\grubx64.efi)

(some identifying numbers replaced trivially)

Cute, heh? But it made no difference. After rebooting with a Live USB again:

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

So the entry was gone.

I changed the EFI partition to FAT16, ran through the installation all the way again. And immediately after the installation was done (before booting to start from disk for the first time):

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot0003* Ubuntu	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\shimx64.efi)
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

This time, when the laptop went on, the BIOS came up with a “Please don’t power off while completing system update”. What it actually did was to write its own backup file into the EFI partition, which appears as /boot/efi/BackupSbb.bin. Actually, it doesn’t seem like it was successful, as the space in the partition ran out. So I deleted this file and turned off the “BIOS Self-Healing” option in the BIOS’ configuration (it will be much worse if it attempts to self-heal on a faulty backup file).

At this point, there was an “ubuntu” entry in the list of boot options in BIOS’ boot menu (Not “Ubuntu”, but “ubuntu”, probably referring to the directory and not the name). And the black “UEFI” option remained in the option list, not possible to choose. So this is why I don’t think it’s relevant.

Asking for a passphrase is too much to ask?

Having reached this far, I got a nice Linux Mint logo on the screen, however nothing happened, and then I got thrown into a initramfs rescue shell. In other words, no attempt to unlock the encrypted partition.

So I ran the live USB again, unlocked the root partition and mounted it as /mnt/root/.

Then, as root (sudo su), bind-mounted the essential directories into the root filesystem:

# for d in /dev /dev/pts /proc /sys /run; do mount --bind $d /mnt/root/$d ; done

And then chrooted into it.

# chroot /mnt/root

Of course, there was no /etc/crypttab, so no wonder that the installation didn’t take unlocking the encrypted partition into account.

So I followed my own instruction from a previous post. First, mount /boot and /boot/efi with

# mount -a

and then check for the UUID of the encrypted partition:

# cryptsetup luksUUID /dev/nvme0n1p2
11223344-5566-7788-99aa-bbccddeeff00

and then add /etc/crypttab reading

luks-11223344-5566-7788-99aa-bbccddeeff00 UUID=11223344-5566-7788-99aa-bbccddeeff00 none luks

Note that the luks-{UUID} part is as the name of the partition as it appears in /dev/mapper. In this case, this was what the Disks GUI utility chose. Had I done this with command line, I could have chosen a shorter name. But who cares.

And finally, edit /etc/default/grub for your preferences, update initramfs and GRUB, exactly as already mentioned in that post:

# update-initramfs -u
# update-grub
# grub-install

It was really exactly the same as the previous post. And then reboot, and all was finally fine.

And by the way, the initrd file is 77 MB. Running update-initramfs again didn’t make it smaller. Not a big deal with a flash disk, anyhow.

But GRUB can open LUKS too!

GRUB has cryptodisk and luks modules which can open an encrypted partition, so in principle it can read the kernel from an encrypted root partition. However there is no mechanism I’m aware of to pass over the unlocked encrypted partition to the kernel, so it would be necessary to supply the passphrase twice when booting.

This is why I went for two partitions for booting. I guess this still is the only sane way.

Converting vtt to srt subtitles with a simple Perl script

I tried to use ffmpeg to convert an vtt file to srt, but that didn’t work at all:

$ ffmpeg -i in.vtt out.srt
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)

I tried a whole lot of suggestions from the Internet, and eventually I gave up.

So I wrote a simple Perl script to get the job done. It took about 20 minutes, because I made a whole lot of silly mistakes:

#!/usr/bin/perl

use warnings;
use strict;

my $n = 1;
my $l;

my $timestamp_regex = qr/[0-9]+:[0-9]+:[0-9:\.]+/; # Very permissive

while (defined ($l = <>)) {
  my ($header) = ($l =~ /^($timestamp_regex --> $timestamp_regex)/);
  next unless (defined $header);

  $header =~ s/\./,/g;

  print "$n\n";
  print "$header\n";

  $n++;

  while  (defined ($l = <>)) {
    last unless ($l =~ /[^ \t\n\r]/); # Nothing but possibly whitespaces

    print $l;
  }
  print "\n";
}

$n--;
print STDERR "Converted $n subtitles\n";

Maybe not a piece of art, and it can surely be made more accurate, but it does the job with simply

$ ./vtt2srt.pl in.vtt > out.srt
Converted 572 subtitles

And here’s why Perl is a pearl.

Long list of IP addresses attacking a phpBB forum in May 2025

A good-old phpBB forum that I run had a massive surge of traffic somewhere in May 2025. This had already started a couple of months earlier, but in May it knocked down the web server due to the number of apache instances. Plus the logs reached ~ 3 GB.

This doesn’t seem to be a DDoS attack, mainly because the access from each IP address was far more scarce than a legit bot: There would be several minutes, if not hours between each request. With the huge amount of bots involved, it would be easy to completely knock out the website with a moderate access pattern from each IP address. Besides, there’s no reason to attack the specific forum. It more looks like a very brute-force attempt to hack the forum itself for the purpose of spamming or something.

One could turn to Cloudflare in situations like this, but I tried the DIY approach. Namely, to block the IP addresses of the bots by virtue of the firewall, as I discuss in a separate post of mine. The strategy with these bots was somewhat different: As the phpBB related bots occasioanally did something that no regular user would do (what exactly I’ll keep to myself) it was enough to detect this event and blacklist the IP address right away.

Which I did. After a couple of day, the list landed on ~78,000 IP addresses. Note that this is larger than ipset’s default number of allowed elements, which is 65536. So when creating the ipset, be sure to use maxelem with a larger number (I used 524288 when I didn’t know how bad the situation was).

I checked a few of these addresses, and they appear to origin from Asia (Vietnam, Indonesia etc.).

The nice thing is that 24-48 hours after I enabled the related blacklisting mechanism, the attack virtually stopped. The firewall recorded no more than ~30 dropped packets per minute on the ipset-matching rule. True victory.

The list of IPs can be downloaded here, in case it helps someone understand what’s going on. If you do have an idea, please comment below.

It’s a plain zip of a list of IP addresses, one address per row. It’s possible that a few IP unrelated addresses appear on this list, which could be other kinds of bots that were blacklisted on the same server. Maybe a dozen of these.

Ehm, no

The silence was short-lived. A couple of days later, the bots came again, this time in a much larger scale. Within a few days, I collected ten times as many IP addresses, that is ~840,000 of them. That’s a really large number. This larger list of IPs can be downloaded here. It was obtained exactly the same as the previous one.

But after this swarm, it got silent again. For a while. In the days that followed, the same traffic pattern came back, IPs were blacklisted (~250k of them at some point) and so it went on. But with the blacklisting mechanism, the load on the server was quite small, so for now I consider the problem solved. Let’s hope it stays this way.

Measuring how much RAM a Linux service eats

Introduction

Motivation: I wanted to move a service to another server that is dedicated only to that service. But how much RAM does this new server need? RAM is $$$, so too much is a waste of money, too little means problems.

The method is to run the service and expose it to a scenario that causes it to consume RAM. And then look at the maximal consumption.

This can be done with “top” and similar programs, but these show the current use. I needed the maximal RAM use. Besides, a service may spread out its RAM consumption across several processes. It’s the cumulative consumption that is interesting.

The appealing solution is to use the fact that systemd creates a cgroup for the service. The answer hence lies in the RAM consumption of the cgroup as a whole. It’s also possible to create a dedicated cgroup and run a program within that one, as shown in another post of mine.

This method is somewhat crude, because this memory consumption includes disk cache as well. In other words, this method shows how much RAM is consumed when there’s plenty of memory, and hence when there’s no pressure to reclaim any RAM. Therefore, if the service runs on a server with less RAM (or the service’s RAM consumption is limited in the systemd unit file), it’s more than possible that everything will work just fine. It might run somewhat slower due to disk access that was previously substituted by the cache.

So using a server with as much memory as measured by the test described below (plus some extra for the OS itself) will result in quick execution, but it might be OK to go for less RAM. A tight RAM limit will cause a lot of disk activity at first, and only afterwards will processes be killed by the OOM killer.

Where the information is

All said in this post relates to Linux kernel v4.15. Things are different with later kernels, not necessarily for the better.

There are in principle two versions of the interface with cgroup’s memory management: First, the one I won’t use, which is cgroup-v2 (or maybe this doc for v2 is better?). The sysfs files for this interface for a service named “theservice” reside in /sys/fs/cgroup/unified/system.slice/theservice.service.

I shall be working with the memory control of cgroup-v1. The sysfs files in question are in /sys/fs/cgroup/memory/system.slice/theservice.service/.

If /sys/fs/cgroup/memory/ doesn’t exist, it might be necessary to mount it explicitly. Also, if system.slice doesn’t exist under /sys/fs/cgroup/memory/ it’s most likely because systemd’s memory accounting is not in action. This can be enabled globally, or by setting MemoryAccounting=true on the service’s systemd unit (or maybe any unit?).

Speaking of which, it might be a good idea to set MemoryMax in the service’s systemd unit in order to see what happens when the RAM is really restricted. Or change the limit dynamically, as shown below.

And there’s always the alternative of creating a separate cgroup and running the service in that group. I’ll refer to my own blog post again.

Getting the info

All files mentioned below are in /sys/fs/cgroup/unified/system.slice/theservice.service/ (assuming that the systemd service in question is theservice).

The maximal memory used: memory.max_usage_in_bytes. As it’s name implies this is the maximal amount of RAM used, measured in bytes. This includes disk cache, so the number is higher than what appears in “top”.

The memory currently used: memory.usage_in_bytes.

For more detailed info about memory use: memory.stat. For example:

$ cat memory.stat 
cache 1138688
rss 4268224512
rss_huge 0
shmem 0
mapped_file 516096
dirty 0
writeback 0
pgpgin 36038063
pgpgout 34995738
pgfault 21217095
pgmajfault 176307
inactive_anon 0
active_anon 4268224512
inactive_file 581632
active_file 401408
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 1138688
total_rss 4268224512
total_rss_huge 0
total_shmem 0
total_mapped_file 516096
total_dirty 0
total_writeback 0
total_pgpgin 36038063
total_pgpgout 34995738
total_pgfault 21217095
total_pgmajfault 176307
total_inactive_anon 0
total_active_anon 4268224512
total_inactive_file 581632
total_active_file 401408
total_unevictable 0

Note the “cache” part at the beginning. It’s no coincidence that it’s first. That’s the most important part: How much can be reclaimed just by flushing the cache.

On a 6.1.0 kernel, I’ve seen memory.peak and memory.current instead of memory.max_usage_in_bytes and memory.usage_in_bytes. memory.peak wasn’t writable however (neither in its permissions nor was it possible to write to it), so it wasn’t possible to reset the max level.

Setting memory limits

It’s possible to set memory limits in systemd’s unit file, but it can be more convenient to do this on the fly. In order to set the hard limit of memory use to 40 MiB, go (as root)

# echo 40M > memory.limit_in_bytes

To disable the limit, pick an unreasonably high number, e.g.

# echo 100G > memory.limit_in_bytes

Note that restarting the systemd service has no effect on these parameters (unless a memory limit is required in the unit file). The cgroup directory remains intact.

Resetting between tests

To reset the maximal value that has been recorded for RAM use (as root)

# echo 0 > memory.max_usage_in_bytes

But to really want to start from fresh, all disk cache needs to be cleared as well. The sledge-hammer way is going

# echo 1 > /proc/sys/vm/drop_caches

This frees the page caches system-wide, so everything running on the computer will need to re-read things again from the disk. There’s a slight and temporary global impact on the performance. On a GUI desktop, it gets a bit slow for a while.

A message like this will appear in the kernel log in response:

bash (43262): drop_caches: 1

This is perfectly fine, and indicates no error.

Alternatively, set a low limit for the RAM usage with memory.limit_in_bytes, as shown above. This impacts the cgroup only, forcing a reclaim of disk cache.

Two things that have no effect:

  • Reducing the soft limit (memory.soft_limit_in_bytes). This limit is relevant only when the system is in a shortage of RAM overall. Otherwise, it does nothing.
  • Restarting the service with systemd. It wouldn’t make any sense to flush a disk cache when restarting a service.

It’s of course a good idea to get rid of the disk cache before clearing memory.max_usage_in_bytes, so the max value starts without taking the disk cache into account.

A function similar to Perl’s die() in bash

This is maybe a bit silly, but Perl has a die() function that is really handy for quitting a script with an error message. And I kind of miss it in Bash. So it can be defined with this simple one-liner:

function die { echo $1 ; exit 1 ; }

And then it can be used with something like:

unzip thefile.zip || die "Unzip returned with error status"

The Perl feeling, in Bash.

Altering the Message-ID header in Thunderbird for non-spam detection

TL;DR

In this post, I suggest manipulating the Message IDs of outgoing mails, so that legit inbound replies to my mails are easily detected as non-spam. I also show how to do this with Thunderbird (Linux version 91.10.0, but it works with practically all versions, I believe).

Briefly about Message-ID

Each email should have a Message-ID header, which uniquely identifies this message. The value of this header should consist of a random string, followed by an ‘@’ and a string that represents the domain name (referred to as FQDN, Fully Qualified Domain Name). This is often the full domain name of the “From” header (e.g. gmail.com).

For example, an email generated by Gmail’s web client had Message-ID: <CAD8P7-R2OuJvGiuQ-0RQqgSSmDguwv1VdjHgQND4jMJxPc628w@mail.gmail.com>. A similar result (same FQDN) was obtained when sending from the phone. However, when using Thunderbird to send an email, only “gmail.com” was set as the FQDN.

Does the Message-ID matter?

Like anything related to email, there are a lot of actors, and each has its own quirks. For example, rspamd adds the spam score by 0.5, with the MID_RHS_NOT_FQDN rule, if the Message ID isn’t an FQDN. I’m not sure to which extent it checks that the FQDN matches the email’s From, but even if it does, it can’t be that picky, given the example I showed above in relation to gmail.com.

It’s quite rare that people care about this header. I’ve seen somewhere that someone sending mails from a work computer didn’t like that the name of the internal domain leaking.

All in all, it’s probably a good idea to make sure that the Message-ID header looks legit. Putting the domain from the From header seems to be a good idea to keep spam filters happy.

Why manipulate the Message-ID?

In an reply, the In-Reply-To header gets the value of the Message ID of the message replied to. So if a spam filter can identify that that the email is genuinely a reply to something I sent, it’s definitely not spam. It’s also a good idea to scan the References header too, in order to cover more elaborate scenarios when there are several people corresponding.

The rigorous way to implement this spam filtering feature is storing the Message IDs of all sent mails in some small database, and check for a match with the content of In-Reply-To of arriving mails. Possible, however daunting.

A much easier way is to change the FQDN part, so that it’s easily identifiable. This is unnecessary if you happen send emails with your own domain, as spam senders are very unlikely to add an In-Reply-To with a matching domain (actually, very few spam messages have an In-Reply-To header at all).

But for email sent through gmail, changing the FQDN to something unique is required to make a distinction.

Will this mess up things? I’m not sure any software tries to fully match the FQDN with the sender, but I suppose it’s safe to add a subdomain to the correct domain. I mean, if both “mail.gmail.com” and “gmail.com” are commonly out there, why shouldn’t “secretsauce.gmail.com” seem likewise legit to any spam filter that checks the message?

And by the way, as of August 2024, a DNS query for mail.gmail.com yields no address, neither for A nor MX. In other words, Gmail itself uses an invalid domain in its Message ID, so any other invented subdomain should do as well.

Changing the FQDN on Thunderbird

Click the hamburger icon, choose Preferences, and scroll down all the way (on the General tab) and click on Config Editor.

First, we need to find Thunderbird’s internal ID number for the mail account to manipulate.

To get a list of IDs, write “useremail” in the search text box. This lists entries like mail.identity.id1.useremail and their values. This listing allows making the connection between e.g. “id1″ and the email address related to it.

For example, to change the FQDN of the mail account corresponding to “id3″, add a string property (using the Config Editor). The key of this property is “mail.identity.id3.FQDN” and the value is something like “secretsauce.gmail.com”.

There is no need to restart Thunderbird. The change is in effect on the next mail sent, and it remains in the settings across restarts.

The need for this feature has been questioned, as was discussed here. So if any Thunderbird maintainer reads this, please keep this feature up and running.

A possible alternative approach

Instead of playing around with the Message-ID, it would be possible to add an entry to the References header (or add this header if there is none). The advantage of this way is that this can also be done by the MTA further down the delivery path, and it doesn’t alter anything that is already in place.

And since it’s an added entry, it can also be crafted arbitrarily. For example, it may contain a timestamp (epoch time in hex) and the SHA1 sum of a string that is composed by this timestamp and a secret string. This way, this proof of genuine correspondence is impossible to forge and may expire with time.

I haven’t looked into how to implement this in Thunderbird. Right now I’m good with the Message-ID solution.