Messy jots on AppSheets

Introduction

These are the jots I wrote down as I learned my way through AppSheet. So this a messy and not so coherent post. Info is scattered here, not necessarily in a sensible order. This is not a tutorial.

My interest comes from the fact that I’m looking for an apartment, and I want to go through a lot of details in each apartment I visit. So I looked for a tool that allows me to organize the information I collect, from the ad, through the thoughts I have about the apartment before visiting it, and most importantly, to efficiently examine the apartment.

This can be done with a plain paper form, of course, but when there are many details and stages, it’s easy to miss one. So the crucial part of my application is to get a quick final view that ensures me I’ve done everything, just before thanking for the visit. This is the most important part, and it’s also the most difficult to implement.

In hindsight, it took 9 days, which is much more time and effort than I intended and wanted to invest on this. I had the “almost done” feeling more or less from the beginning, but every next step turned out much more difficult (and annoying) than I imagined. AppSheet makes you feel like a code monkey that doesn’t know coding. And if AppSheet is used for a long-term project that is maintained over time, be sure to have a QA team that tests everything in every possible combination. The framework invites silly bugs everywhere.

General

  • In a nutshell, AppSheet takes data from a database (preferably represented as a spreadsheet in Google Sheets) and presents this data in different Views, allowing both displaying and changing the data. The other side of AppSheet is called Automation, its ability to generate files, send emails / SMSes, submit HTTP requests etc. with the information in the database, based upon a template. This can be a pdf for an invoice, or a summary report of an inspection. Or an XML or JSON file, pushed to a web server to initiate some action on the information. So Views are primarily intended to feed data into the system, and Automation covers the consumption of the data for a useful purpose.
  • AppSheet at no cost is unlimited in time and covers every feature of the platform with up to 10 test users. One user, myself, is enough. In free mode, the app can is limited to Prototype status, which has no significance when used only by myself.
  • AppSheet reminds of Drupal in many ways, and it’s not necessarily a good thing. Lots of click-click-click around, everything is graphical, and one ends up feeling like a machine doing a silly task a gazillion times because it’s the same as configuring a Windows Server. Every. Single. Detail. Must. Be. Configured. Manually. With a prayer that there will never be a need to change all those settings that have been repeatedly configured. Plus, small bugs creep in everywhere, as there are a whole lot of details that one must keep track on when making every little change.
  • Another thing common with Drupal is that every little task takes a lot of time, and sometimes the answer is simply “no, that’s impossible”. At least with Drupal, one can hack the PHP code. For example, the one thing that I considered important, but didn’t implement, was to get a list of the items in a checklist that I didn’t check. I didn’t want to implement the checklist as individual Yes/No (boolean) columns, because adding an item would require modifying the table, regenerating the schema, editing the View and then the report. So I did that with an EnumList, but the only information it emits is what was checked. So all in all, there is no way to implement this rather important feature without turning the checklist into a nightmare to maintain in the long run.
  • There are certain limitations on databases and tables, in particular 1000 rows per database in the Free plan and for all plans there are 20 tables per database and 100 columns per table. Don’t worry, you’ll lose your sanity before getting even near these limits.
  • Press the “Save” button to the upper right to make changes visible (e.g. when checking with mobile app). This also clears existing warnings, whether they were tended to or not.
  • Offline use works great: No problem using the mobile phone even with no Intenet coverage. Update to cloud occurs later (may be required to open the app for that). Actually, the only reason I use AppSheets instead of a JavaScript home-cooked application is the ability to work offline.
  • Backing up an app: Impossible. There’s no way to download the app’s own settings, and deleting an app accidentally is easy. The best solution I found for this is to share the app (or better, a copy of it) with another user (being yourself) and make a copy of the app as the other user. And never log in as the other user again. This way, the copy at the other user is safe from accidental tampering. I speculate that the reason for not being able to download the app itself is that AppSheet’s business model is pay-per-user, and it’s free for less than 10 users. So had it been possible to download and restore the app from a file, a lot of no-cost single-user accounts would have been created and deployed.

Random insights

  • Each time “Save” is clicked, a new version is saved. It’s possible to go back to older versions. Version history is saved 7 days back by default.
  • It’s possible to run a preview of the app by right-clicking the three dots to the right of the app’s description in the list of apps. Doing that, the specific version of the app is previewed, ignoring subsequent new saved versions of the app. There’s a misleading “refresh” icon on the app preview. It probably only updates data, not the app itself.
  • If a “Map” view is generated automatically, delete it. Otherwise it will show up as the default view for everything (push marketing, it’s called), and it’s as annoying as it’s pointless.

Sources of info

Creating a new app

This describes how to create an app based upon a spreadsheet. The spreadsheet’s first row consists of the description of each column, and the following rows contain data. Multiple sheets are treated as separate tables, each sheet’s name being the AppSheet table name (so don’t rename the sheets in the spreadsheet).

Upload the .ods or .xlsx file to Google Drive, or start a new Google Sheet.

if you’re already registered with AppSheet, right-click the file in Google Drive, pick Open With and choose AppSheet. An app is generated automatically from the data in the file. The name of the app will be the same as the originating spreadsheet’s file name. The column types (text, number, price etc.) are automatically assigned, but can be changed later on in the Data section. It’s even possible to change the type to Image, in which case pictures that are uploaded or taken with the mobile phone are stored in a subdirectory to where the Google Sheet item is saved with a name like appname_Images/. The value in the spreadsheet is the path to the image file, e.g. appname_Images/J1.Silly.140258.jpg.

It’s best to delete the .ods or .xlsx from Google drive at this stage to avoid confusion: A Google Sheets entry is created in the same directory. Changes made with AppSheet go to this entry, not to the file used to create the app.

Doing the same from within AppSheet (not clear if this is better):

  • Navigate to the page listing apps (AppSheet’s main page), click “+ Create”, pick “App” and then “Start with existing data”.
  • Give the app a name. For category, I choose “Inspection & Surveys”, and then click “Choose your data”.
  • Select an existing Google Sheets entry in your Google Drive account as the database (Google Sheets).
  • Trying to use an .ods file as the data source caused AppSheet to hang (wait indefinitely to complete creating the app).

It’s also possible to use an AppSheet database for this purpose, but that seems to be a leftover from the days before AppSheet became part of Google. There’s no reason I’m aware of to prefer AppSheet database, and there’s no obvious way to make a backup of it, nor to restore. Possible with exporting to CSV with Automation, I suppose, but never tried it.

A directory named “appsheet” is created in Google Drives’ home directory. It contains a directory structure with an empty file called “empty.txt”. A subdirectory is created for each AppSheet app, and files generated by the App go into that subdirectory. It’s possible to select a different target for output files, however (see below).

When an app is created, you get an email with an invitation to use it or develop it.

To do after a while

After playing around with an app for a while, it’s a good idea to make a few fixes. Go to the Views section (click on the third icon from the top) and then on the settings icon (a flywheel).

  • In Views > General, choose the starting View.
  • Also in Views > General, set the Image Upload size to the desired level (High for 1600x900 with my phone, of Full).
  • In Offline mode, enable “Store content for offline use”.
  • Go do the Data view, choose the main table, and opt out deleting rows (the flywheel icon to the top right for a table’s settings). Delete rows directly on the spreadsheet if needed.
  • Change the Default app folder to /theprojectname/appsheet-data or something like that, which should be a valid path in Google Drive. So that the files land somewhere sensible. This is done in the Setting section (flywheel icon to the left), under “Information”.

Views

  • The View displays (or allows editing) a subset of column values of a row of a table. In some cases, several rows are listed in one View. Which columns are displayed, how they are displayed, and if they are editable, is what the View’s configuration is about. But no matter how you twist and turn it, a View shows values of columns (possibly along with the column names). If you want to display anything, turn it into a column (possibly a Virtual column, defined only inside AppSheet, see below).
  • The relation between the AppSheet’s table, which is used in Views and expressions, is not 1:1 with the table in the spreadsheet: Virtual columns are added, either by AppSheet itself (e.g. _RowNumber and Row ID), or by the user. In the latter case, the value of the Virtual Column is an expression that the user supplies. This can be a constant, for example if the purpose is to create a menu button in a View. The expression can also be a formula, which depends on the value of other columns of the same or other tables. In these expressions, [X] is the value of column X of the same row.
  • Clicking / tapping on an displayed item on a View initiates an Action, which often consists of opening a view for a specific row in a table for read-only access or editing. These are system-generated view with names like “main_detail”, “main_form” and “main_inline” for a table named “main”. These Views can be modified like any other view. And a whole lot of other actions can be added and used as the response.
  • It’s often difficult to figure out which View appears in response to clicks, as it’s often a system-generated one. Be sure to have the Edit option on (top-right on the development screen). Hover over the relevant area, wait for the pencil icon to appear, click it and pick e.g. “Edit View”.
  • If a system-generated view is deleted, it’s created anew after saving, apparently with default settings.
  • If a column is added to the spreadsheet’s table, it’s not available for use in the AppSheet immediately. The AppSheet’s table schema needs to be regenerated for this to happen (click the Data icon, click the table, three dots, and then “Regenerate Schema”). Virtual columns are not deleted, despite the scary warning one gets.
  • When clicking / tapping an item can result in an detailed view or the possibility to edit, these are system-generated views, that appear at the bottom left in the Views subsection. These Views can be modified.
  • Views listed under “Primary Navigation” are accessible directly at the bottom part of the screen. Those under “Menu Navigation” are accessible through the menu. “Reference Views” are invisible to the user, except for when requested from other views, for example in an Action (i.e. a response to clicking / tapping an item).
  • There are Format Rules allowing to change colors of items etc depending on boolean expressions. In most cases, they apply to columns and how they are displayed (with a different color, or with a colored icon added). Unfortunately, it’s impossible (as far as I know) to write catch-all rules for several columns, as the expression for activating the rule doesn’t play ball with the [_THIS] expression, which means “this column”. So for example, if you want a rule that marks unfilled columns with a red dot, add (or duplicate) one rule for each and every column, and be sure that the rule for column X doesn’t by mistake change the display format of column Y. It’s just an invitation to make silly bugs.
  • The column’s name is displayed above the column’s value in forms and detail Views by default. This can be changed by opening the column’s settings in the Data section, and change the “Display name”. So use concise names in the spreadsheet.
  • A column of type Show is useful for text, URLs, images etc, that appear in forms for instructions, setting page breaks etc. These are best added as virtual columns with literal values in the expression. Don’t forget to turn off the “Editable” attribute of this column. It’s quite unfortunate that instruction text appears as an extra column along with the spreadsheet’s data, but that’s the way it’s done.
  • To navigate to another view as a result of clicking on a column (possibly a Virtual Column acting as a menu entry):
    • First, an Action needs to be created. Click on the Action icon (looks like electric zap), click on the “+” at the top left, and add an action
    • Pick the table related to the view that this action shall work on, and pick “App: go to another view within this app”.
    • As for the Target, choose the expression LINKTOVIEW([targetview]) (given that “targetview” is the name of the column containing the name of the view to navigate to, as a plain text column).
    • Go to the relevant item in the View, and choose the newly created action under the Behavior section.

There are several types of Views, but the most important for a plain spreadsheet-based app are:

  • Form: All columns listed in the “Column order” are shown and editable. The title of each input (text box, drop-down menu etc.) is the name of the column in the database or spreadsheet, unless overridden in the column’s configuration (in the Data section). Unfortunately, a back-swipe on a mobile phone is interpreted as “Cancel”, and there’s no way around this. One must explicitly tap on Save, or else all is info is lost.
  • Details: Like Form, but read-only, with an icon at the bottom right for editing the content, which switches to a Form with the same set of fields. Unfortunately, blank fields are not shown unless their Show property is set to the expression TRUE (just checking the checkbox isn’t good enough, it has to be done with an expression).
  • Table: As the name implies, like Details, but with the data shown as a table (with rows being rows, columns being columns). Selecting a row brings to a Details view of it. It’s possible to configure which columns appear in the table, possibly only one. So this can be a concise way to display a list of rows.
  • Card and Deck: Each row gets a small pane with two or a few selected column values shown. Selecting one brings to a Details view.
  • Gallery: Shows the image associated with each row (if such exists) and the value of the column marked as Label.
  • Dashboard: A View containing other Views.

Adding a new table

  • In Appsheet, click the Data icon (the second icon from the top).
  • Click on the “+” (Add new data)
  • Navigate to the relevant Google Sheet.
  • Navigate to the related table, and add it.

Settings for each column in a table

In the Data section, there are several attributes one should look at in a table’s configuration for a table:

  • Name: Must match the name in the spreadsheet’s header (or the column name of a database’s table). The app fails to load otherwise.
  • Type: Number, text, image, reference, price, there are a lot of options. The automatic choice is usually correct for the simple textual columns. For more sophisticated input (drop-down menus, images, references etc.) this is the place to configure that.
  • Key: This checkbox is checked for one column only, selecting the key column in the database sense. Relevant in particular when using references.
  • Label: This checkbox is checked for one column only (plus, possibly, an image), selecting which column appears in several views as the row’s main description.
  • Formula: Left blank for columns taken from databases, must contain something for a Virtual Column. When set to something, the value of the column is the expression in this formula.
  • SHOW?: As its name implies, it controls if the column should be shown in views. This checkbox should usually be checked: If a column isn’t desired in a view, it should be removed from there. If the column should be displayed in Details Views even when blank, checking this isn’t enough: The Show propery must be set to TRUE as an expression.
  • EDITABLE?: Can the value of the column change?
  • REQUIRE?: Must the column have a value to allow finishing an edit session containing this column?
  • Initial value: The column’s initial value when a new row is created.
  • Display name: If left blank, the Name field from above appears above the value of this column in forms etc. Display name overrides this otherwise.
  • Description: For internal documentation, and is also used instead of the column name if “Display Name” is left blank.
  • Search: Is this column involved in searches?
  • Scan, NFC, PII: Can this column accept data from these input methods?(scan means barcode scanning)

Modifying the spreadsheet

It’s possible to modify the spreadsheet even after the app has been created, including adding a column in the middle. The immediate effect is an error (“unable to fetch app definition” on the mobile app, or just an error on the development console). To resolve this, pick the “Data” icon (second from the top”) on the development console, and then click the round arrow (as in “reload”) to regenerate the structure. A scary warning pops up in response to that, but just go ahead.

Note that the connection between AppSheet and the spreadsheet is through the names of the columns as given in the first row, as well as the name of the sheet. These should not be modified (unless the manipulation is intentional).

References etc.

AppSheet definitely supports relational databases. In fact, by choosing names of columns where one is the plural form of the other (“owner” column and “owners” table) the relation is possibly set up automatically. This is however not necessarily a good idea, because the tools choose which column’s value identifies the entire row — and getting this wrong could mess up things later, and it’s difficult to fix it afterwards.

Rather, if a spreadsheet (or virtual) column (in “table A”) is assigned the type “Ref”, it’s also required to assign the table referred to (“table B”). The value of the column selects the row that has a key column equal to it. In the View showing this column, the label column of the referred table is shown. Clicking / tapping on the label leads to a View of that row: its columns’ values as well as other rows referencing it. In order to access a specific column, add a virtual column to Table A with a formula like [the_refcol_in_table_A].[the_col_in_table_B]. It’s not as difficult as is sounds, as the GUI starts suggesting completions when it sees the brackets and the dot.

Note that in Appsheet, a reference actually means the whole row. In many practical situations, only the label column is displayed, but it’s important to remember that conceptually, a Ref column represents the entire row it refers to.

And this brings me to back-reference. When a table B is referenced by a column somewhere in table A, a virtual column is automatically added to table B. This column has the type List and its value is something like REF_ROWS(“table A”, “referring_column”). The arguments are the name of table A and the name of the column with type Ref in that table. So this column lists all rows in table A that refer to the row in table B. Consider deselecting its “SHOW” checkbox in the Data view, if this back and forth is undesired.

Also, note that the back-reference is shown as a list of inline elements on the View of Table B. In order to determine how it appears, configure the system-generated view with name e.g. tableA_Inline (for example, if it’s a table or a deck). Even more important, it allows choosing what happens when one of the elements is clicked: View details (e.g. tableA_Detail view), edit (e.g. tableA_Form) or do something completely different with an Action?

And when references exist, it’s important to get the keys right. It’s possible to choose anything that is guaranteed to be unique, but because references relate to the key, there might be weird side-effects if the value of the key column changes. For example, if we choose a person’s email address as the key, all references to that row are lost if this person changes this address.

What we really want is a solid reference to the row itself. This is best done by adding a dedicated column for that in the table (i.e. in the spreadsheet), and assign it with a unique value when a new row is created: Use UNIQUEID() as this column’s initial value. When AppSheet adds a new row, it will calculate a unique ID and put the value there. Make this column non-editable (possibly also turn off “SHOW” too). In fact, AppSheet does this automatically when I give a column the name “key_id”.

Note that calculating UNIQUEID() in a virtual column is pointless. That’s not a solution for creating a unique ID for a row.

Other insights:

  • Allowing upload of multiple images (or adding other items): Create a table consisting of “key_id” (used as key), “ref”, “image” and “weight” columns. The “ref” column is declared as a Ref type to the table in which the images will be stored. Enable “is part of”, which makes it possible to add new items from the main view. So all in all, there is a list of images in the main view, (of table B). Each row in the view points to a row in table A. This makes it possible to add an arbitrary number of items from a view of a row in the main view. Each item becomes a new row in table A, with a reference to table B. “weight” is a number, allowing to control the order when displaying the image. Using “key_id” as the key, and not “image” (which is the name to the file containing the image) makes it possible to show the same image in different Views, even from different tables.
  • Drop-down menus with text taken from somewhere else: Prepare a table (a sheet, and import it as a table) with keys as numbers and text (or anything else) as values. Then, on the table where we want the drop-down, set the column’s type to Ref, and point to the said tables as the reference. Select the input type to Buttons or Drop-down. In order to allow adding new possible values, set it to Enum (or EnumList for multiple values), and the Base type to Ref. Choose the said table as the target for the reference. Note however that in the latter method, all already existing keys must be added manually to the column’s definitions, exactly like any Enum type. The only difference is that the keys (possibly numbers) are fed manually, not the text. So if adding new possibilities is required, plain Enum is probably the best way.
  • It’s impossible to get a list of options not chosen in an EnumList. It’s possible to write an expression that reduces the chosen elements from a given list with those selected by an EnumList, but that requires keeping this expression and the EnumList’s value in sync. And if one forgets to update the expression after adding it to the EnumList, it’s a nasty bug.

One long row or relational database?

As there are a lot of pieces of information about an apartment, which can be divided into categories and will be handled by different Views, my database manager self was inclined towards a relational database approach. One row with a few columns for each apartment in the “main table”, and then several tables, one for each information category (i.e. View), referenced by the “main table”.

The alternative approach is one table for all information (except for images and other utilities) and to make one long row with all info. It’s up to the Views to select which columns to show: All columns in the original table (or spreadsheet sheet) don’t have to be visible. It’s possible to make several views of the same table. But this means a lot of columns to handle (add virtual columns to that), making it difficult to keep track of the whole thing.

I went for the one long row approach. The main reason was that this approach makes it easier to reorganize the Views if needed. For example, if fields are added over time, and a View becomes too crowded, it’s not a big deal to split it into two Views, if they both refer to the same table anyhow. Or to move a field to another View.

That said, the separate table table approach would definitely have worked: When a new item is added, that means a row in the main table. A new row is created on each separate table by virtue of a menu button (a virtual column with a specified Action). LINKTOFORM() allows opening a form with specific initial values, so the key column can be set in the child table to the parent table’s key column value, ensuring a 1:1 relationship (and prevents duplicate rows).

For reports, the child table is shown as a back-reference to the main tables with a Start/End pair of template tags (maybe there’s a simpler way to do this).

In hindsight, I should have taken the separate table approach, but in a way I didn’t think about at first: There should have been a table with one row for each room, regardless of its type. This would have reduced the number of columns in the main table, and the amount of work and headache is linear to the number of columns. The trick would have been not to show all columns of a room’s table in the form View, but only those suitable for the specific type of room. The rules for which types of rooms exist, and which columns should be shown in the View would be listed in a separate table. Hence the “Show if” rule would look up the column in this rule table, in relation to the room’s type.

Use slices?

The question is how to display a subset of the columns in different Views.

The seemingly natural (and less preferred) way is by using slices, which is a virtual table, calculated on the fly, consisting of a subset of rows and columns as configured. This allows displaying only a subset of rows, columns and actions, filtered with a condition on the data (or just not show all columns). Slices are configured with the “+” icon on each data set (inside the “Data” section) as they are considered some form of table. Alternatively, slices are hinted on the View’s configuration as well.

This would have been the preferred way, had it not been for an annoying caveat: If a new column is added to the database (i.e. sheet), it appears in all slices. This makes slices a no-go for the purpose of selecting columns for a view.

Instead, the columns to display should be selected in the “Column order” part, which can be found in the “View Options” section for each View configuration. Note that “Column order” isn’t present for Views that don’t display columns, e.g. Deck and Gallery.

It makes sense to start from generating an empty view, and copying it for each time a new view is created.

Creating a pdf file from the data of a row

…or for that matter, a CSV / JSON / XML / HTML / XLSX file.

The important point is that this is done with Automation, and that the execution is triggered by an Event, and not as an Action. In other words, clicking something on a View won’t trigger the generation of the report directly. Rather, clicking the entry in the View causes an Action that modifies a row in a table, or to add a row a table, and that, in turn, triggers the Event.

There is no direct connection between the Action and the task it requests — the generation of a file, in this case. It can take 20-30 seconds from the triggering event until the file is created.

One option to request the generation of a file from an app: First, add two columns to the table for which the report is desired: “do_report” and “last_report”. In AppSheet, regenerate the schema if necessary, and set the columns as follows:

  • do_report: Type is “Yes/No” (sweet talk for Boolean), turn off “show”, initial value “FALSE”
  • last_report: Type is “Text”, and set Initial Value to “Never”. “Editable” must remain enabled, even though it shouldn’t be accessible by forms: Its intention is to be set by the Bot, and this is possible only if the column is editable.
  • Add a virtual column named “generate_report”, “Text” type, with expression as follows:
    CONCATENATE("Generate report
    Last report: ", [last_report])

    Note that there is a new line in the middle of the string.

  • Create a new Action (Bzzt icon): For “Do this” pick “Data: set the values of some columns in this row”. Pick the “do_report” column and set the expression to TRUE. I’m going to use the Virtual Column defined above as a menu item, so for Position, pick “Inline” and at “Attach to column”, pick “generate_report”. In Behavior, set “Only if this condition is true” NOT([do_report]), and optionally also enable the request for confirmation.
  • Add the “generate_report” column to the View (or Views) from which the report should be requested.
  • Optionally (and recommended), add a Format rule for generate_report. The condition is [do_report] and if so, the text should be grey (color #888888) and in italics. For “Format these columns and actions” pick generate_report, and possibly also the action created just above. Without this, there is no immediate visual feedback to “pushing the button”.

Now to the part that actually generates the report.

  • Go to the Automation section (robot icon) and create a new Bot.
  • Click “Configure Event” and give it a name.
  • In the pane to the right, the Event source should be “App”. Pick the table from which the Event is triggered.
  • Select only “Updates” for Data change type.
  • Condition: [do_report]
  • Next, define what should be executed — the Process. Add Steps, which is more or less like a line in a computer program: Each of them can perform something, it can check a condition and branch, or wait for the condition to be true. Or call a process. These are steps to define (give each some name, as usual) and configure them on the pane to the right.
    • Type: “Run a data action”. In the right pane, pick “Set row values”, and set “do_report” to FALSE.
    • Type: “Run a task”. Set up the task as follows in the right pane:
      • Pick Create a new file
      • HTTP Content Type: pdf
      • Template: Click Create (or pick an existing template, if you have one from previous sessions). A “View” button appears after this is completed. More on templates in a separate section below.
      • Optional: Change File Folder Path to “/” (with the quotes), File Name Prefix to “report” (once again, with the quotes) and opt-in Disable Timestamp. This way, report.pdf is created at the project’s root directory (not Google Drive’s root directory in most cases). Expressions can be used for these, of course.
    • Type: “Run a data action”. In the right pane, pick “Set row values”, and set last_report to TEXT(NOW(), “DD/MM/YYYY HH:MM:SS”) for British time format.

Note that it’s possible to create a pdf file, but the template file is an HTML file. This makes it much easier to handle the template, and it’s always possible to go back to the HTML file to see what went wrong. The disadvantage with an HTML template is that the file generation fails on the tiniest HTML syntax error. For example, a missing </tr> tag. Or even more ridiculous, a missing </p> tag.

In order to view the generated file by clicking a menu item in a View, create a Virtual Row for this purpose, and assign it with an Action when selected. This action should be “External: Open a file”. The “File” entry is a URL to where the file is stored on Google drive. Obtain this link with the Share feature, so the expression becomes something like “https://drive.google.com/file/d/1d8AWHjLJdK9cyhYGBPnuVJJ0asmLmZVw/view?usp=sharing” (the quotes included). This is not a security risk as one chooses to share only with oneself (when obtaining the link from Google Drive).

If the execution fails due to a malformed template file (something inside the markups didn’t work fine), the app’s View doesn’t always get updated until a restart of the app (or reload the entire page on a web browser).

Actually, the way the whole thing behaves is that the menu button gets greyed out immediately after being pressed, and then gets back to normal after the file has been generated. But the color is determined by do_report’s status, which is turned back to FALSE before the attempt to generate a file. So if the file generation takes time, it may return to normal before the file is updated — but this has never had any practical consequence for me. But why doesn’t it go back to normal when the file generation fails? do_report is FALSE either way.

To view error messages: Go to the “Monitor” section (icon at the left side, one from the bottom), and pick “Launch Log Analyzer” in the “Audit History” part. After some graphs, there’s a table with the last tasks initiated. Where there is an error, click the binocular icon, for details. A JSON text entry appears. The “Errors” key is where the error message is shown. It can take a few minutes before the entry with the error is shown, and before than, there may be entries showing success even if an error has occurred. So wait a few minutes before ruling out a failed run.

Templates

A template is a file in Doc or HTML format. Everything is passed through to the output document, except for markups with the <<expression>> form (or, for an HTML template, &lt;&lt;expression&gt;&gt;. The expression is evaluated exactly in the same way as the value of a Virtual Column, and the markup is substituted with the expression’s result. Hence to obtain a substitution with the value of a column named “thecol”, just insert <<[thecol]>>.

No matter if the output is pdf or HTML, when using HTML templates, be sure that the HTML file has all the classic wrapper tags: <html>, <head>, <body> etc. or else AppSheet ignores all <<something>> (or actually, &lt;&lt;something&gt;&gt;) markups altogether and the output is the template file itself.

There are special kinds of markups:

  • Start/End for looping on a list of keys to a table
  • If/EndIf markups, allowing a segment to be included or not. Apparently not available with HTML templates, as it’s not listed explicitly, and I failed to use it (so I used the IF() function instead).

The automatically created template is simple and not necessarily very helpful: This template encompasses all real columns (but not the virtual ones), showing the column’s name (not its display name) and their plain values. Its most valuable part is where it shows the use of Start/End (when relevant, i.e. when there are back-references to the displayed table). Note that as shown in the template, the Start and End tags must be inside a <p> block. They can’t be just a floating piece of code, or within another couple of tags (<div> doesn’t work, for example). Otherwise one gets a misleading error message saying the table with name “Start:” can’t be found + a lot of error messages for everything between the two tags.

The template is annoying in particular in relation to Ref types: The actual value, which is the key itself, is shown for references, and not the referenced value, as seen on the app’s Views. It would have been sweet of AppSheet to look up which of the referenced table’s columns is displayed in the View, and show that. But there’s another reason for this: Say, that the Ref column is “choice” and the column to display on the referenced table is “label”. The expression for the value to display is [choice].[label]. But if “choice” happens to be blank, and this expression appears anywhere in the template, the file generation fails. So be sure to assign an initial value to all Ref columns when creating a new row. Plus, possibly make a simple sanity check on the column, just in case:

<<IF(ISBLANK([choice]), "Error!", [choice].[label])>>

Not sure if this is bullet-proof, but it solves the most obvious problem.

I’ve also got the impression that if the result of the expression of an AppSheet markup expression returns anything containing HTML markups (e.g. <b> and </b>) the file generation fails completely.

Notes to self

These are cryptic notes to myself, for my own apps’ housekeeping tasks.

Adding/removing a new item to a checklist

  • Update the related checkcnt expression with the new number of items, so that the report says all is done when it really is.

Adding a new column

  • Be sure to regenerate the schema of the related table in AppSheet.
  • Possibly add (or duplicate) an “Unfilled X” Format rule (e.g. ISBLANK([streetaddress]) ), if it matters that the field is empty. Mark with Cyan dot, and don’t forget to set the affected column correctly.
  • Set the SHOW property of the column to the expression TRUE if it’s supposed to appear in the detailed view, even if it’s blank. Just clicking on the “SHOW” checkbox isn’t good enough.
  • Set the Display property to something friendly

The striked-out bullets were relevant when I thought about using Detail view as the summary of all. I used a pdf report now, so they’re not relevant.

Adding a new View Form

  • There should be a status_* column in relation to this in the spreadsheet. Its setting inside AppSheet: Ref to const_group_status with Buttons (not “part of”), enable Show, Editable, Require, Initial Value is “0″ (with quotes), Display name is “Status”. The four checkboxes to the right are left unchecked.
  • In the main table, add a Virtual Column with the same name as the Form View it relates to, e.g. view_base for status_base (so it’s easier to match between them later). Its type is Text, and its value is the relevant menu item’s description.
  • Only now, create a new Form view. In Column order, pick Manual, select all, unselect those not required. Be sure to keep the relevant status_* column.
  • Go to the “Actions” section (Bzzzt icon) and add a new action (actually, duplicate a similar one). Name it e.g. “Jump to base form”. For a record of “main”, Do this is “App: go to another view within this app”. Set Target to LINKTOROW([key_id], “view_base”) where “view_base” is the name of the View. Set Position to Inline, and Attach to column as the related Virtual Column. Pick a suitable icon too. Unfortunately, there is no catch-all action for navigating.
  • Duplicate a couple of “Mark incomplete” format rules, and adapt them to the new View: Change the expression to match the correct status_* column, and also the “Format these columns” to the relevant “view_*” with the same name. And test it. There’s no way around a couple of rules for each View.

Un-ignore /usr/lib/systemd/ in .gitignore with git repo on root filesystem

Actually, this is about un-ignoring any subdirectory that is grandchild to an ignored directory.

Running Linux Mint 22.2 (based upon Ubuntu 24.04), and having a git repository on root filesystem to keep track of the computer’s configuration, the vast majority of directories are ignored. One of the is /lib, however /lib/systemd/ should not be ignored, as it contains crucial files for the system’s configuration.

On other distributions, the relevant part in .gitignore usually goes:

[ ... ]
bin/
boot/
dev/
home/
lib/*
!lib/systemd/
lib64/
lib32/
libx32/
lost+found/
media/
mnt/
opt/
proc/
root/
run/
sbin/
[ ... ]

So lib/ isn’t ignored as a directory, but all its content, including subdirectories is. That allows for un-ignoring lib/systemd/ on the following row. That’s why lib/ isn’t ignore-listed like the other ones.

But on Linux Mint 22.2, /lib is a symbolic link to /usr/lib. And since git treats a symbolic link just like a file, /lib/systemd/ is treated as /usr/lib/systemd. Ignoring /lib as a directory has no effect, and un-ignoring /lib/systemd has no effect, because to git, this directory doesn’t even exist.

So go

$ man gitignore

and try to figure out what to do. It’s quite difficult actually, but it boils down to this:

usr/*
!usr/lib/
usr/lib/*
!usr/lib/systemd/

It’s a bit tangled, but the point is that /usr/lib is un-ignored, then all its files are ignored, and then /usr/lib/systemd is un-ignored.

The only good part about this solution is that it works.

Footnote and whole-page layout with wkhtmltopdf

This HTML code makes wkhtmltopdf create a single page with a footnote. If the external <div> is duplicated, separate pages are generated.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<div style="height: 1350px; display: flex; flex-direction: column; break-inside: avoid; border:1px solid #668;">
This is just a test.
<div style="margin-top: auto;">
This is a footnote
</div>
</div>
</body>
</html>

So how does it work? The important part is the “style” attribute of the outer <div> tag:

  • height: 1350px: This sets the <div> block’s height to a full A4 page. Why 1350 pixels? I don’t know. I just tweaked with this figure until it got right. It’s possible another figure is needed on a different version of wkhtmltopdf. I’ve tried to set this with cm as well as pt units, but none corresponded to the standard figures for an A4 page. So I went with pixels, which clarifies that it’s a wild guess.
  • display: flex; flex-direction: column: This turns this <div> block into a Flexbox container, with vertical packing. This is needed to push the footnote’s block to the bottom.
  • break-inside: avoid: This tells wkhtmltopdf to avoid page breaks in the middle of the block. This makes no difference for a single page, but if this <div> block is repeated, this style attribute ensures that each block gets a separate page (unless any of the pages exceeds a page’s height).
  • border:1px solid #668: This generates a border around the <div> block’s occupied area. Used only for finding the correct height attribute, and should should be removed afterwards (unless this border is desired on every page).

The footnote is pushed to the bottom of the page by virtue of the margin-top: auto style attribute and the fact that the <div> block having this attribute is within a vertical packed Flexbox container.

Notes:

  • This was done with wkhtmltopdf 0.12.4, without the “wkhtmltopdf patches” according to the man page.
  • If the height is too large on any page, all break-inside are ignored. In other words, the whole pdf document gets garbled, not just around the page that went wrong.
  • I tried changing the resolution on my X11 display, and it didn’t make any difference. This might sound like a silly thing to check, but wkhtmltopdf depends on the X11 server.

Notes on installing Linux Mint 22.2 (full disk encryption)

Introduction

These are my notes to self for the next time I install a Linux system. As if I read my previous posts before attempting.

So I installed Linux Mint 22.2 (kernel 6.14.0-29) on a Lenovo V14 G4 IRU laptop. With Cinnamon, of course, not that it’s relevant.

All that I wanted was a full-disk encryption, but being allowed to choose the setup of the partitions explicitly, and not let the installation wizard make the choices for me. In particular, I wanted a swap partition with the size I choose, and even more important: Almost all disk space in a /storage mount, so that one can fill the hard disk with junk without risking a system failure because root partition is full.

Cutting the cake

/boot/efi is where the BIOS reads from. It’s natural to put it as the first partition, and it can be very small (even 1 MB can be enough in some cases, but let’s not push it like I eventually did). But if you make it really small, it’s a FAT16 partition, not FAT32, and that’s OK. Don’t force it into FAT32, because the system won’t boot with it if it has less clusters than required.

So it goes like this:

  • Create three partitions:
    • First one for /boot/efi (e.g. nvme0n1p1), 10 MB. This must be FAT32 or FAT16 (the latter for 10MB). Note that 10MB is a bit too small, because the BIOS won’t have room for its own backup this way.
    • Second one for /boot, will contain initramfs images, so ~500 MB. Any filesystem that GRUB can read (so ext4 is definitely OK)
    • Third partition for LUKS
  • In the LUKS partition, create an LVM with partitions for / (100G) and swap. The rest is for /storage.

It’s somewhat confusing that /boot/efi is a subdirectory of /boot, but that’s the way it is.

Running the installation wizard

  • Unlock the encrypted partition, if it’s not already (e.g. with the “Disks” GUI utility). This requires giving the passphrase, of course.
  • Double-click the “Install Linux Mint” icon on the desktop.
  • When reaching the “Installation type”, pick “Something else”.
  • Set the following mount points:
    • Set the small FAT partition (nvme0n1p1 in my case) as “EFI System Partition”
    • /boot on the partition allocated for that (non-encrypted partition, possibly ext4).
    • / on the relevant LVM partition inside the encrypted block
    • Set the swap partition
  • Set the “Device for boot loader installation” to the one allocated for “EFI System Partition” (nvme0n1p1) in my case. One may wonder why this isn’t done automatically. Note that it’s the first partition (/dev/nvme0n1p1) and not the entire disk (/dev/nvme0n1).
  • Don’t do anything with the planned /storage partition. As I don’t want to assign it with a mounting point, handle it after the installation is done.

If the installation ends with a failure to install GRUB, run “journalctl” on a terminal window and look for error messages from the grub installer. Don’t ask ChatGPT to help you with solving any issues, and don’t ask me why I know it’s a bad idea.

When I insisted on FAT32

Sometimes I’m too much of a control freak, and when the Disks utility formatted the EFI partition into FAT16, I thought, oh no, it should be FAT32, what if the BIOS won’t play ball?

Well, that was silly of me, and also silly to ignore the warning about a FAT32 filesystem with just 10 MB having too few clusters.

So even though the installer wizard finished successfully, there was no option to boot from the disk. Secure boot was disabled, of course. And yet, there was no suitable option in the BIOS’ boot menu. There was a “UEFI” option there, which is always in black (not possible to select), but that doesn’t seem to be relevant.

Following the warm advice of ChatGPT, I added an entry while in Live USB mode:

# efibootmgr -c -d /dev/nvme0n1p1 -p 1 -L "Linux Mint" -l '\EFI\ubuntu\grubx64.efi' 
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC
Boot0003* Linux Mint	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\grubx64.efi)

(some identifying numbers replaced trivially)

Cute, heh? But it made no difference. After rebooting with a Live USB again:

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

So the entry was gone.

I changed the EFI partition to FAT16, ran through the installation all the way again. And immediately after the installation was done (before booting to start from disk for the first time):

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot0003* Ubuntu	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\shimx64.efi)
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

This time, when the laptop went on, the BIOS came up with a “Please don’t power off while completing system update”. What it actually did was to write its own backup file into the EFI partition, which appears as /boot/efi/BackupSbb.bin. Actually, it doesn’t seem like it was successful, as the space in the partition ran out. So I deleted this file and turned off the “BIOS Self-Healing” option in the BIOS’ configuration (it will be much worse if it attempts to self-heal on a faulty backup file).

At this point, there was an “ubuntu” entry in the list of boot options in BIOS’ boot menu (Not “Ubuntu”, but “ubuntu”, probably referring to the directory and not the name). And the black “UEFI” option remained in the option list, not possible to choose. So this is why I don’t think it’s relevant.

Asking for a passphrase is too much to ask?

Having reached this far, I got a nice Linux Mint logo on the screen, however nothing happened, and then I got thrown into a initramfs rescue shell. In other words, no attempt to unlock the encrypted partition.

So I ran the live USB again, unlocked the root partition and mounted it as /mnt/root/.

Then, as root (sudo su), bind-mounted the essential directories into the root filesystem:

# for d in /dev /dev/pts /proc /sys /run; do mount --bind $d /mnt/root/$d ; done

And then chrooted into it.

# chroot /mnt/root

Of course, there was no /etc/crypttab, so no wonder that the installation didn’t take unlocking the encrypted partition into account.

So I followed my own instruction from a previous post. First, mount /boot and /boot/efi with

# mount -a

and then check for the UUID of the encrypted partition:

# cryptsetup luksUUID /dev/nvme0n1p2
11223344-5566-7788-99aa-bbccddeeff00

and then add /etc/crypttab reading

luks-11223344-5566-7788-99aa-bbccddeeff00 UUID=11223344-5566-7788-99aa-bbccddeeff00 none luks

Note that the luks-{UUID} part is as the name of the partition as it appears in /dev/mapper. In this case, this was what the Disks GUI utility chose. Had I done this with command line, I could have chosen a shorter name. But who cares.

And finally, edit /etc/default/grub for your preferences, update initramfs and GRUB, exactly as already mentioned in that post:

# update-initramfs -u
# update-grub
# grub-install

It was really exactly the same as the previous post. And then reboot, and all was finally fine.

And by the way, the initrd file is 77 MB. Running update-initramfs again didn’t make it smaller. Not a big deal with a flash disk, anyhow.

But GRUB can open LUKS too!

GRUB has cryptodisk and luks modules which can open an encrypted partition, so in principle it can read the kernel from an encrypted root partition. However there is no mechanism I’m aware of to pass over the unlocked encrypted partition to the kernel, so it would be necessary to supply the passphrase twice when booting.

This is why I went for two partitions for booting. I guess this still is the only sane way.

Converting vtt to srt subtitles with a simple Perl script

I tried to use ffmpeg to convert an vtt file to srt, but that didn’t work at all:

$ ffmpeg -i in.vtt out.srt
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)

I tried a whole lot of suggestions from the Internet, and eventually I gave up.

So I wrote a simple Perl script to get the job done. It took about 20 minutes, because I made a whole lot of silly mistakes:

#!/usr/bin/perl

use warnings;
use strict;

my $n = 1;
my $l;

my $timestamp_regex = qr/[0-9]+:[0-9]+:[0-9:\.]+/; # Very permissive

while (defined ($l = <>)) {
  my ($header) = ($l =~ /^($timestamp_regex --> $timestamp_regex)/);
  next unless (defined $header);

  $header =~ s/\./,/g;

  print "$n\n";
  print "$header\n";

  $n++;

  while  (defined ($l = <>)) {
    last unless ($l =~ /[^ \t\n\r]/); # Nothing but possibly whitespaces

    print $l;
  }
  print "\n";
}

$n--;
print STDERR "Converted $n subtitles\n";

Maybe not a piece of art, and it can surely be made more accurate, but it does the job with simply

$ ./vtt2srt.pl in.vtt > out.srt
Converted 572 subtitles

And here’s why Perl is a pearl.

Long list of IP addresses attacking a phpBB forum in May 2025

A good-old phpBB forum that I run had a massive surge of traffic somewhere in May 2025. This had already started a couple of months earlier, but in May it knocked down the web server due to the number of apache instances. Plus the logs reached ~ 3 GB.

This doesn’t seem to be a DDoS attack, mainly because the access from each IP address was far more scarce than a legit bot: There would be several minutes, if not hours between each request. With the huge amount of bots involved, it would be easy to completely knock out the website with a moderate access pattern from each IP address. Besides, there’s no reason to attack the specific forum. It more looks like a very brute-force attempt to hack the forum itself for the purpose of spamming or something.

One could turn to Cloudflare in situations like this, but I tried the DIY approach. Namely, to block the IP addresses of the bots by virtue of the firewall, as I discuss in a separate post of mine. The strategy with these bots was somewhat different: As the phpBB related bots occasioanally did something that no regular user would do (what exactly I’ll keep to myself) it was enough to detect this event and blacklist the IP address right away.

Which I did. After a couple of day, the list landed on ~78,000 IP addresses. Note that this is larger than ipset’s default number of allowed elements, which is 65536. So when creating the ipset, be sure to use maxelem with a larger number (I used 524288 when I didn’t know how bad the situation was).

I checked a few of these addresses, and they appear to origin from Asia (Vietnam, Indonesia etc.).

The nice thing is that 24-48 hours after I enabled the related blacklisting mechanism, the attack virtually stopped. The firewall recorded no more than ~30 dropped packets per minute on the ipset-matching rule. True victory.

The list of IPs can be downloaded here, in case it helps someone understand what’s going on. If you do have an idea, please comment below.

It’s a plain zip of a list of IP addresses, one address per row. It’s possible that a few IP unrelated addresses appear on this list, which could be other kinds of bots that were blacklisted on the same server. Maybe a dozen of these.

Ehm, no

The silence was short-lived. A couple of days later, the bots came again, this time in a much larger scale. Within a few days, I collected ten times as many IP addresses, that is ~840,000 of them. That’s a really large number. This larger list of IPs can be downloaded here. It was obtained exactly the same as the previous one.

But after this swarm, it got silent again. For a while. In the days that followed, the same traffic pattern came back, IPs were blacklisted (~250k of them at some point) and so it went on. But with the blacklisting mechanism, the load on the server was quite small, so for now I consider the problem solved. Let’s hope it stays this way.

Measuring how much RAM a Linux service eats

Introduction

Motivation: I wanted to move a service to another server that is dedicated only to that service. But how much RAM does this new server need? RAM is $$$, so too much is a waste of money, too little means problems.

The method is to run the service and expose it to a scenario that causes it to consume RAM. And then look at the maximal consumption.

This can be done with “top” and similar programs, but these show the current use. I needed the maximal RAM use. Besides, a service may spread out its RAM consumption across several processes. It’s the cumulative consumption that is interesting.

The appealing solution is to use the fact that systemd creates a cgroup for the service. The answer hence lies in the RAM consumption of the cgroup as a whole. It’s also possible to create a dedicated cgroup and run a program within that one, as shown in another post of mine.

This method is somewhat crude, because this memory consumption includes disk cache as well. In other words, this method shows how much RAM is consumed when there’s plenty of memory, and hence when there’s no pressure to reclaim any RAM. Therefore, if the service runs on a server with less RAM (or the service’s RAM consumption is limited in the systemd unit file), it’s more than possible that everything will work just fine. It might run somewhat slower due to disk access that was previously substituted by the cache.

So using a server with as much memory as measured by the test described below (plus some extra for the OS itself) will result in quick execution, but it might be OK to go for less RAM. A tight RAM limit will cause a lot of disk activity at first, and only afterwards will processes be killed by the OOM killer.

Where the information is

All said in this post relates to Linux kernel v4.15. Things are different with later kernels, not necessarily for the better.

There are in principle two versions of the interface with cgroup’s memory management: First, the one I won’t use, which is cgroup-v2 (or maybe this doc for v2 is better?). The sysfs files for this interface for a service named “theservice” reside in /sys/fs/cgroup/unified/system.slice/theservice.service.

I shall be working with the memory control of cgroup-v1. The sysfs files in question are in /sys/fs/cgroup/memory/system.slice/theservice.service/.

If /sys/fs/cgroup/memory/ doesn’t exist, it might be necessary to mount it explicitly. Also, if system.slice doesn’t exist under /sys/fs/cgroup/memory/ it’s most likely because systemd’s memory accounting is not in action. This can be enabled globally, or by setting MemoryAccounting=true on the service’s systemd unit (or maybe any unit?).

Speaking of which, it might be a good idea to set MemoryMax in the service’s systemd unit in order to see what happens when the RAM is really restricted. Or change the limit dynamically, as shown below.

And there’s always the alternative of creating a separate cgroup and running the service in that group. I’ll refer to my own blog post again.

Getting the info

All files mentioned below are in /sys/fs/cgroup/unified/system.slice/theservice.service/ (assuming that the systemd service in question is theservice).

The maximal memory used: memory.max_usage_in_bytes. As it’s name implies this is the maximal amount of RAM used, measured in bytes. This includes disk cache, so the number is higher than what appears in “top”.

The memory currently used: memory.usage_in_bytes.

For more detailed info about memory use: memory.stat. For example:

$ cat memory.stat 
cache 1138688
rss 4268224512
rss_huge 0
shmem 0
mapped_file 516096
dirty 0
writeback 0
pgpgin 36038063
pgpgout 34995738
pgfault 21217095
pgmajfault 176307
inactive_anon 0
active_anon 4268224512
inactive_file 581632
active_file 401408
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 1138688
total_rss 4268224512
total_rss_huge 0
total_shmem 0
total_mapped_file 516096
total_dirty 0
total_writeback 0
total_pgpgin 36038063
total_pgpgout 34995738
total_pgfault 21217095
total_pgmajfault 176307
total_inactive_anon 0
total_active_anon 4268224512
total_inactive_file 581632
total_active_file 401408
total_unevictable 0

Note the “cache” part at the beginning. It’s no coincidence that it’s first. That’s the most important part: How much can be reclaimed just by flushing the cache.

On a 6.1.0 kernel, I’ve seen memory.peak and memory.current instead of memory.max_usage_in_bytes and memory.usage_in_bytes. memory.peak wasn’t writable however (neither in its permissions nor was it possible to write to it), so it wasn’t possible to reset the max level.

Setting memory limits

It’s possible to set memory limits in systemd’s unit file, but it can be more convenient to do this on the fly. In order to set the hard limit of memory use to 40 MiB, go (as root)

# echo 40M > memory.limit_in_bytes

To disable the limit, pick an unreasonably high number, e.g.

# echo 100G > memory.limit_in_bytes

Note that restarting the systemd service has no effect on these parameters (unless a memory limit is required in the unit file). The cgroup directory remains intact.

Resetting between tests

To reset the maximal value that has been recorded for RAM use (as root)

# echo 0 > memory.max_usage_in_bytes

But to really want to start from fresh, all disk cache needs to be cleared as well. The sledge-hammer way is going

# echo 1 > /proc/sys/vm/drop_caches

This frees the page caches system-wide, so everything running on the computer will need to re-read things again from the disk. There’s a slight and temporary global impact on the performance. On a GUI desktop, it gets a bit slow for a while.

A message like this will appear in the kernel log in response:

bash (43262): drop_caches: 1

This is perfectly fine, and indicates no error.

Alternatively, set a low limit for the RAM usage with memory.limit_in_bytes, as shown above. This impacts the cgroup only, forcing a reclaim of disk cache.

Two things that have no effect:

  • Reducing the soft limit (memory.soft_limit_in_bytes). This limit is relevant only when the system is in a shortage of RAM overall. Otherwise, it does nothing.
  • Restarting the service with systemd. It wouldn’t make any sense to flush a disk cache when restarting a service.

It’s of course a good idea to get rid of the disk cache before clearing memory.max_usage_in_bytes, so the max value starts without taking the disk cache into account.

A function similar to Perl’s die() in bash

This is maybe a bit silly, but Perl has a die() function that is really handy for quitting a script with an error message. And I kind of miss it in Bash. So it can be defined with this simple one-liner:

function die { echo $1 ; exit 1 ; }

And then it can be used with something like:

unzip thefile.zip || die "Unzip returned with error status"

The Perl feeling, in Bash.

Altering the Message-ID header in Thunderbird for non-spam detection

TL;DR

In this post, I suggest manipulating the Message IDs of outgoing mails, so that legit inbound replies to my mails are easily detected as non-spam. I also show how to do this with Thunderbird (Linux version 91.10.0, but it works with practically all versions, I believe).

Briefly about Message-ID

Each email should have a Message-ID header, which uniquely identifies this message. The value of this header should consist of a random string, followed by an ‘@’ and a string that represents the domain name (referred to as FQDN, Fully Qualified Domain Name). This is often the full domain name of the “From” header (e.g. gmail.com).

For example, an email generated by Gmail’s web client had Message-ID: <CAD8P7-R2OuJvGiuQ-0RQqgSSmDguwv1VdjHgQND4jMJxPc628w@mail.gmail.com>. A similar result (same FQDN) was obtained when sending from the phone. However, when using Thunderbird to send an email, only “gmail.com” was set as the FQDN.

Does the Message-ID matter?

Like anything related to email, there are a lot of actors, and each has its own quirks. For example, rspamd adds the spam score by 0.5, with the MID_RHS_NOT_FQDN rule, if the Message ID isn’t an FQDN. I’m not sure to which extent it checks that the FQDN matches the email’s From, but even if it does, it can’t be that picky, given the example I showed above in relation to gmail.com.

It’s quite rare that people care about this header. I’ve seen somewhere that someone sending mails from a work computer didn’t like that the name of the internal domain leaking.

All in all, it’s probably a good idea to make sure that the Message-ID header looks legit. Putting the domain from the From header seems to be a good idea to keep spam filters happy.

Why manipulate the Message-ID?

In an reply, the In-Reply-To header gets the value of the Message ID of the message replied to. So if a spam filter can identify that that the email is genuinely a reply to something I sent, it’s definitely not spam. It’s also a good idea to scan the References header too, in order to cover more elaborate scenarios when there are several people corresponding.

The rigorous way to implement this spam filtering feature is storing the Message IDs of all sent mails in some small database, and check for a match with the content of In-Reply-To of arriving mails. Possible, however daunting.

A much easier way is to change the FQDN part, so that it’s easily identifiable. This is unnecessary if you happen send emails with your own domain, as spam senders are very unlikely to add an In-Reply-To with a matching domain (actually, very few spam messages have an In-Reply-To header at all).

But for email sent through gmail, changing the FQDN to something unique is required to make a distinction.

Will this mess up things? I’m not sure any software tries to fully match the FQDN with the sender, but I suppose it’s safe to add a subdomain to the correct domain. I mean, if both “mail.gmail.com” and “gmail.com” are commonly out there, why shouldn’t “secretsauce.gmail.com” seem likewise legit to any spam filter that checks the message?

And by the way, as of August 2024, a DNS query for mail.gmail.com yields no address, neither for A nor MX. In other words, Gmail itself uses an invalid domain in its Message ID, so any other invented subdomain should do as well.

Changing the FQDN on Thunderbird

Click the hamburger icon, choose Preferences, and scroll down all the way (on the General tab) and click on Config Editor.

First, we need to find Thunderbird’s internal ID number for the mail account to manipulate.

To get a list of IDs, write “useremail” in the search text box. This lists entries like mail.identity.id1.useremail and their values. This listing allows making the connection between e.g. “id1″ and the email address related to it.

For example, to change the FQDN of the mail account corresponding to “id3″, add a string property (using the Config Editor). The key of this property is “mail.identity.id3.FQDN” and the value is something like “secretsauce.gmail.com”.

There is no need to restart Thunderbird. The change is in effect on the next mail sent, and it remains in the settings across restarts.

The need for this feature has been questioned, as was discussed here. So if any Thunderbird maintainer reads this, please keep this feature up and running.

A possible alternative approach

Instead of playing around with the Message-ID, it would be possible to add an entry to the References header (or add this header if there is none). The advantage of this way is that this can also be done by the MTA further down the delivery path, and it doesn’t alter anything that is already in place.

And since it’s an added entry, it can also be crafted arbitrarily. For example, it may contain a timestamp (epoch time in hex) and the SHA1 sum of a string that is composed by this timestamp and a secret string. This way, this proof of genuine correspondence is impossible to forge and may expire with time.

I haven’t looked into how to implement this in Thunderbird. Right now I’m good with the Message-ID solution.

Linux kernel workqueues: Is it OK for the worker function to kfree its own work item?

Freeing yourself

Working with Linux kernel’s workqueues, I incremented a kref reference count before queuing a work item, in order to make sure that the data structure that it operated on will still be in memory while it runs. Just before returning, the work item’s function decremented this reference count, and as a result, the data structure’s memory could be freed at that very moment.

The thing was, that this data structure also included the work item’s own struct work_struct. In other words, the work item’s function could potentially free the entry that was pushed into the workqueue on its behalf. Could this possibly be allowed?

The short answer is yes. It’s OK to call kfree() on the memory of the struct work_struct of the currently running work item. No risk for use-after-free (UAF).

It’s also OK to requeue the work item on the same workqueue (or on a different one). All in all, the work item’s struct is just a piece of unused memory as soon as the work item’s function is called.

On the other hand, don’t think about calling destroy_workqueue() on the workqueue on which the running work item is queued: destroy_workqueue() waits for all work items to finish before destroying the queue, which will never happen if the request to destroy the queue came from one of its own work items.

From the horse’s mouth

I didn’t find any documentation on this topic, but there are a couple of comments in the source code, namely in the process_one_work() function in kernel/workqueue.c: First, this one by Tejun Heo from June 2010:

/*
 * It is permissible to free the struct work_struct from
 * inside the function that is called from it, this we need to
 * take into account for lockdep too.  To avoid bogus "held
 * lock freed" warnings as well as problems when looking into
 * work->lockdep_map, make a copy and use that here.
 */

And this comes after calling the work item’s function, worker->current_func(work). Written by Arjan van de Ven in August 2010.

/*
 * While we must be careful to not use "work" after this, the trace
 * point will only record its address.
 */
trace_workqueue_execute_end(work, worker->current_func);

The point of this comment is that the value of @work will be used by the call to trace_workqueue_execute_end(), but it won’t be used as a pointer. This emphasizes the commitment of not touching what @work points at, i.e. the memory segment may have been freed.

How it’s done

process_one_work(), which is the only function that calls the work item’s function, is clearly written in a way that ignores the work item’s struct after calling the work item’s function.

The first thing is that it copies the address of the work function into the worker struct:

worker->current_func = work->func;

It then removes the work item from the workqueue:

list_del_init(&work->entry);

And later on, it calls the function, using the copy of the pointer (even though it could also have used the original at this point).

worker->current_func(work);

After this, the @work variable isn’t used anymore as a pointer.