Introduction
These are my notes as I programmed an Atmel AT25128 EEPROM, attached to a PEX 8606 PCIe switch, using PCIe configuration-space writes only (that is, no I2C / SMBus cable). This is frankly quite redundant, as Avago supplies software tools for doing this.
In fact, in order to get their tools, register at Avago’s site, then make the extra registration in PLX Tech’ site. None of these registrations require signing an NDA. At PLX Tech’s site, pick SDK -> PEX at the bottom of list of devices to get documentation for, and download the PLX SDK. Among others, this suite includes the PEX Device Editor, which is quite a useful tool regardless of switches, as it gives a convenient tree view of the bus. The Device Editor, as well as other tools, allow programming the EEPROM from the host, with or without an I2C cable.
There are also other tools in the SDK that do the same thing PLXMon in particular. If you have an Aardvark I2C to USB cable, the PLXMon tool allows reading and writing to the EEPROM through I2C. And there’s a command line interface, probably for all functionality. So really, this is really for those who want to get down to the gory details.
All said below will probably work with the entire PEX 86xx family, and possibly with other Avago devices as well. The Data Book is your friend.
The EEPROM format
The organization of data in the outlined in the Data Book, but to keep it short and concise: It’s a sequence of bytes, consisting of a concatenation of the following words, all represented in Little Endian format:
- The signature, always 0x5a, occupying one byte
- A zero (0x00), occupying one byte
- The number of bytes of payload data to come, given as a 16-bit words (two bytes). Or equivanlently, the number of registers to be written to, multiplied by 6.
- The address of the register to be written to, divided by 4, and ORed with the port number, left shifted by 10 bits. See the data book for how NT ports are addressed. This field occupies 16 bits (two bytes). Or to put it in C’ish:
unsigned short addr_field = (reg_addr >> 2) | (port << 10)
- The data to be written: 32 bits (four bytes)
Items #4 and #5 are repeated for each register write. There is no alignment, so when this stream is organized in 32-bit words, it becomes somewhat inconvenient.
And as the Data Book keeps saying all over the place: If the Debug Control register (at 0x1dc) is written to, it has to be the first entry (occupying bytes 4 to 9 in the stream). Its address representation in the byte stream is 0x0077, for example (or more precisely, the byte 0x77 followed by 0x00).
Accessing configuration space registers
Given the following PCI bus setting:
02:00.0 PCI bridge: PLX Technology, Inc. Unknown device 8606 (rev ba)
03:01.0 PCI bridge: PLX Technology, Inc. Unknown device 8606 (rev ba)
03:05.0 PCI bridge: PLX Technology, Inc. Unknown device 8606 (rev ba)
03:07.0 PCI bridge: PLX Technology, Inc. Unknown device 8606 (rev ba)
03:09.0 PCI bridge: PLX Technology, Inc. Unknown device 8606 (rev ba)
In particular note that the switch’ upstream port 0 is at 02:00.0.
Reading from the Serial EEPROM Buffer register at 264h (as root, of course):
# setpci -s 02:00.0 264.l
00000000
The -s 02:00.0 part selects the device by its bus position (see above).
Note that all arguments as well as return values are given in hexadecimal. An 0x prefix is allowed, but it’s redundant.
Making a dry-run of writing to this register, and verifying nothing happened:
# setpci -Dv -s 02:00.0 264.l=12345678
02:00.0:264 12345678
# setpci -s 02:00.0 0x264.l
00000000
Now let’s write for real:
# setpci -s 02:00.0 264.l=12345678
# setpci -s 02:00.0 264.l
12345678
(Yey, it worked)
Reading from the EEPROM
Reading four bytes from the EEPROM at address 0:
# setpci -s 02:00.0 260.l=00a06000
# setpci -s 02:00.0 264.l
0012005a
The “a0″ part above sets the address width explicitly to 2 bytes on each operation. There may be some confusion otherwise, in particular if the device wasn’t detected properly at bringup. The “60″ part means “read”.
Just checking the value of the status register after this:
# setpci -s 02:00.0 260.l
00816000
Same, but read from EEPROM address 4. The lower 13 LSBs are used as bits [14:0] of the EEPROM address. It’s also possible to access higher addresses (see the respective Data Book).
# setpci -s 02:00.0 260.l=00a06001
# setpci -s 02:00.0 264.l
0008c03a
Or, to put it in a simple Bash script (this one reads the first 16 DWords, i.e. 64 bytes) from the EEPROM of the switch located at the bus address given as the argument to the script (see example below):
#!/bin/bash
DEVICE=$1
for ((i=0; i<16; i++)); do
setpci -s $DEVICE 260.l=`printf '%08x' $((i+0xa06000))`
usleep 100000
setpci -s $DEVICE 264.l
done
Rather than checking the status bit for the read to be finished, the script waits 100 ms. Quick and dirty solution, but works.
Note: usleep is deprecated as a command-line utility. Instead, odds are that “sleep 0.1″ replaces “usleep 100000″. Yes, sleep takes non-integer arguments in non-ancient UNIXes.
Writing to the EEPROM
Important: Writing to the EEPROM, in particular the first word, can make the switch ignore the EEPROM or load faulty data into the registers. On some boards, the EEPROM is essential for the detection of the switch by the host and its enumeration. Consequently, writing junk to the EEPROM can make it impossible to rectify this through the PCIe interface. This can render the PCIe switch useless, unless this is fixed with I2C access.
Before starting to write, the EEPROM’s write enable latch needs to be set. This is done once for each write as follows, regardless of the desired target address:
# setpci -s 02:00.0 260.l=00a0c000
Now we’ll write 0xdeadbeef to the first 4 bytes of the EEPROM.
# setpci -s 02:00.0 264.l=deadbeef
# setpci -s 02:00.0 260.l=00a04000
If another address is desired, add the address in bytes, divided by 4 to 00004000 above. The write enable latch is the same (no change in the lower bits is required).
Here’s an example of the sequence for writing to bytes 4-7 of the EEPROM (all three lines are always required)
# setpci -s 02:00.0 260.l=00a0c000
# setpci -s 02:00.0 264.l=010d0077 # Just any value goes
# setpci -s 02:00.0 260.l=00a04001
Or making a script of this, which writes the arguments from address 0 and on (for those who like to make big mistakes…)
#!/bin/bash
numargs=$#
DEVICE=$1
shift
for ((i=0; i<(numargs-1); i++)); do
setpci -s $DEVICE 260.l=00a0c000
setpci -s $DEVICE 264.l=$1
setpci -s $DEVICE 260.l=`printf '%08x' $((i+0xa04000))`
usleep 100000
shift
done
Again, usleep can be replaced with a plain sleep with a non-integer argument. See above.
Example of using these scripts
# ./writeeeprom.sh 02:00.0 0006005a 00ff0081 ffff0001
# ./readeeprom.sh 02:00.0
0006005a
00ff0081
ffff0001
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
ffffffff
When the EEPROM gets messed up
It’s more than possible that the switch becomes unreachable to the host as a result of messing up the EEPROM’s registers. For example, by changing the upstream port setting. A simple way out, if a blank EEPROM is good enough for talking with the switch, is to force the EEPROM undetected by e.g. short-circuiting the EEPROM’s SO pin (pin number 2 on AT25128) to ground with a 33 Ohm resistor or so. This prevents the data from being loaded, but the commands above will nevertheless work, so the content can be altered. Yet another “dirty, but works” solution.
This has been documented elsewhere, but it’s important enough to have a note about here.
In short, before switching to a new hardware, it’s essential to prepare it, or an 0x0000007b blue screen will occur on the new hardware.
The trick is to run sysprep.exe (under windows\system32\sysprep\) before the transition. Have “Generalize” checked, and choose “shutdown” at the end of the operation (“Shutdown Options”).
Once the computer shuts down, move the hard disk to the new computer. Windows should boot smoothly, and start a series of installation stages, including feeding the license key and language settings. Also, an account needs to be created. This account can be deleted afterwards, as the old account is kept. Quite silly, as a matter of fact.
While working on a project involving a custom PCIe interface, Linux’ message log became flooded with messages like
pcieport 0000:00:1c.6: device [8086:a116] error status/mask=00001081/00002000
pcieport 0000:00:1c.6: [ 0] Receiver Error
pcieport 0000:00:1c.6: [ 7] Bad DLLP
pcieport 0000:00:1c.6: [12] Replay Timer Timeout
pcieport 0000:00:1c.6: Error of this Agent(00e6) is reported first
pcieport 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0200(Transmitter ID)
pcieport 0000:02:00.0: device [10b5:8606] error status/mask=00003000/00002000
pcieport 0000:02:00.0: [12] Replay Timer Timeout
pcieport 0000:00:1c.6: AER: Corrected error received: id=00e6
pcieport 0000:00:1c.6: can't find device of ID00e6
pcieport 0000:00:1c.6: AER: Corrected error received: id=00e6
pcieport 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0200(Transmitter ID)
And before long, some 400 MB of log messages accumulated in /var/log/messages. In this context, they are merely informative AER (Advanced Error Reporting) messages, telling me that errors have occurred in the link between the computer’s PCIe controller and the PCIe switch on the custom board. But all of these errors were correctable (presumably with retransmits) so from a functional standpoint, the hardware worked.
Advanced Error Reporting, and its Linux driver was explained in OLS 2007 (pdf).
Had it not been for these messages, I could have been mislead to think that all was fine, even though there’s a method to tell, which I’ve dedicated an earlier post to. So they’re precious, but they flood the system logs, and even worse, the system is so busy handling them, that the boot is slowed down, and sometimes the boot process got stuck in the middle.
At first I thought that it would be enough to just turn off the logging of these messages, but it seems like the flood of interrupts was the problem.
So one way out is to disable the handler of AER altogether: Use the pci=noaer kernel parameter on boot, or disable the CONFIG_PCIEAER kernel configuration flag, and recompile the kernel. This removes the piece of code that configures the computer’s root port to send interrupts if and when an AER message arrives, but that way I won’t be alerted that a problem exists.
So I went for hacking the kernel code. In an early attempt, I went for not producing error messages for each event, but to keep it down to no more than 5 per second. It worked in the sense that the log wasn’t flooded, but didn’t solve the problem of a slow or impossible boot. As mentioned earlier, the core problem seems to be a bombardment of interrupts.
So the hack that eventually did the job for me tells the root port to stop generating interrupts after 100 kernel messages have been produced. That’s enough to inform me that there’s a problem, and give me an idea of where it is, but it stops soon enough to let the system live.
The only file I modified was drivers/pci/pcie/aer/aerdrv_errprint.c on a 4.2.0 Linux kernel. In retrospective, I could have done it more elegant. But hey, now that it works, why should I care…?
It goes like this: I defined a static variable, countdown, and initialized it to 100. Before a message is produced, a piece of code like this runs:
if (!countdown--)
aer_enough_is_enough(dev);
aer_enough_is_enough() is merely a copy of aerdrv.c’s aer_disable_rootport(), which is defines as static there, and requires an uncomfortable argument. It would have made more sense to make aer_disable_rootport() a wrapper of another function, which could have been used both by aerdrv.c and my little hack — that would have been much more elegant.
Instead, I copied two additional static functions that are required by aer_disable_rootport() into aerdrv_errprint.c, and ended up with an ugly hack that solves the problem.
With all due shame, here’s the changes in patch format. It’s not intended to apply on your kernel as is. It’s more intended to be a guideline to how to get it done. And by all means, take a look on aerdrv.c’s relevant functions, and see if they’re different, by any chance.
From b007850486167288ea4c6c6a1bf30ddd1a299f24 Mon Sep 17 00:00:00 2001
From: Eli Billauer <my-mail@gmail.com>
Date: Sat, 17 Oct 2015 07:37:19 +0300
Subject: [PATCH] PCIe AER handler: Turn off interrupts from root port after 100 messages
---
drivers/pci/pcie/aer/aerdrv_errprint.c | 78 ++++++++++++++++++++++++++++++++
1 files changed, 78 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 167fe41..31a8572 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -20,6 +20,7 @@
#include <linux/pm.h>
#include <linux/suspend.h>
#include <linux/cper.h>
+#include <linux/pcieport_if.h>
#include "aerdrv.h"
#include <ras/ras_event.h>
@@ -129,6 +130,74 @@ static const char *aer_agent_string[] = {
"Transmitter ID"
};
+/* Two functions copied from aerdrv.c, to prevent name space pollution */
+
+static int set_device_error_reporting(struct pci_dev *dev, void *data)
+{
+ bool enable = *((bool *)data);
+ int type = pci_pcie_type(dev);
+
+ if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
+ (type == PCI_EXP_TYPE_UPSTREAM) ||
+ (type == PCI_EXP_TYPE_DOWNSTREAM)) {
+ if (enable)
+ pci_enable_pcie_error_reporting(dev);
+ else
+ pci_disable_pcie_error_reporting(dev);
+ }
+
+ if (enable)
+ pcie_set_ecrc_checking(dev);
+
+ return 0;
+}
+
+/**
+ * set_downstream_devices_error_reporting - enable/disable the error reporting bits on the root port and its downstream ports.
+ * @dev: pointer to root port's pci_dev data structure
+ * @enable: true = enable error reporting, false = disable error reporting.
+ */
+static void set_downstream_devices_error_reporting(struct pci_dev *dev,
+ bool enable)
+{
+ set_device_error_reporting(dev, &enable);
+
+ if (!dev->subordinate)
+ return;
+ pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+}
+
+/* Allow 100 messages, and then stop it. Since the print functions are called
+ from a work queue, it's safe to call anything, aer_disable_rootport()
+ included. */
+
+static int countdown = 100;
+
+/* aer_enough_is_enough() is a copy of aer_disable_rootport(), only the
+ latter requires to get the aer_rpc structure from the pci_dev structure,
+ and then uses it to get the pci_dev structure. So enough with that too.
+*/
+
+static void aer_enough_is_enough(struct pci_dev *pdev)
+{
+ u32 reg32;
+ int pos;
+
+ dev_err(&pdev->dev, "Exceeded limit of AER errors to report. Turning off Root Port interrupts.\n");
+
+ set_downstream_devices_error_reporting(pdev, false);
+
+ pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
+ /* Disable Root's interrupt in response to error messages */
+ pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, ®32);
+ reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
+ pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, reg32);
+
+ /* Clear Root's error status reg */
+ pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, ®32);
+ pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32);
+}
+
static void __print_tlp_header(struct pci_dev *dev,
struct aer_header_log_regs *t)
{
@@ -168,6 +237,9 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
int layer, agent;
int id = ((dev->bus->number << 8) | dev->devfn);
+ if (!countdown--)
+ aer_enough_is_enough(dev);
+
if (!info->status) {
dev_err(&dev->dev, "PCIe Bus Error: severity=%s, type=Unaccessible, id=%04x(Unregistered Agent ID)\n",
aer_error_severity_string[info->severity], id);
@@ -200,6 +272,9 @@ out:
void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
{
+ if (!countdown--)
+ aer_enough_is_enough(dev);
+
dev_info(&dev->dev, "AER: %s%s error received: id=%04x\n",
info->multi_error_valid ? "Multiple " : "",
aer_error_severity_string[info->severity], info->id);
@@ -226,6 +301,9 @@ void cper_print_aer(struct pci_dev *dev, int cper_severity,
u32 status, mask;
const char **status_strs;
+ if (!countdown--)
+ aer_enough_is_enough(dev);
+
aer_severity = cper_severity_to_aer(cper_severity);
if (aer_severity == AER_CORRECTABLE) {
--
1.7.2.3
And again — it’s given as a patch, but really, it’s not intended for application as is. If you need to do this yourself, read through the patch, understand what it does, and make the changes with respect to your own kernel. Or your system may just hang.
A few jots on playing with the system logger (the one that writes to /var/log/messages) on an ancient CentOS 5.5.
First, check the version: It says
Oct 6 15:12:06 diskless syslogd 1.4.1: restart.
So it’s a quite old revision of syslogd, unfortunately. There are no filter conditions to rely on.
The relevant configuration file is /etc/syslog.conf. First, one may divert the log messages from /var/log/messages to /var/log/kernel by changing
*.info;mail.none;authpriv.none;cron.none /var/log/messages
to
*.info;mail.none;authpriv.none;cron.none;kern.none /var/log/messages
kern.* /var/log/kernel-junk
Or, alternatively, divert only less-than-warnings messages to kernel-junk (with lazy flushing):
*.info;mail.none;authpriv.none;cron.none;kern.none;kern.warn /var/log/messages
kern.* -/var/log/kernel-junk
The trick is that kern.none disables all kernel messages to /var/log/messages. The following kern.warn turns warnings and up back on. kernel-junk gets everything.
General notes
For plain byte-per-byte hex dump,
$ hexdump -C
To dump a limited number of bytes, use the -n flag:
$ hexdump -C -n 64 /dev/urandom
00000000 9c 72 b0 43 da 6e 27 2f f9 f1 34 06 60 d5 71 ad |.r.C.n'/..4.`.q.|
00000010 cc 07 89 02 f7 f9 5f 85 f6 ba a5 24 cc 9f 2d d5 |......_....$..-.|
00000020 6d da 5b 91 a6 23 d4 94 51 1d 96 a7 5c 34 1a 48 |m.[..#..Q...\4.H|
00000030 6e 13 d4 3a 54 5d c5 c4 7b 1e f3 7b 6f 84 af 8b |n..:T]..{..{o...|
00000040
And possibly add the -v flag so that repeated lines are printed out explicitly
$ hexdump -C -n 64 /dev/zero
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000040
$ hexdump -C -v -n 64 /dev/zero
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040
Hexdump scripting
Hexdump has a somewhat weird one-liner scripting syntax. It consists of the -e flag(s) followed by a string, which must be enclosed in a single quote signs. Within this string, there may be several double quotes containing formatting info. Probably, the only way to really figure this out is trying some examples.
- Everything in the expression runs as a loop.
- n/m (n and m are integers) means n times consume m bytes regarding the expression following immediately.
- If there is more than one -e, they consume the same data for each -e
- %08_ax is the data offset in hex. Also try “%10_ad: ” for decimal position.
- Anything not interpreted is printed (a bit like printf). That includes, of course, “\n”.
- For editing hex data, ghex can be handy
Scripting examples
Print out the input as 32-bit hex integers, one per line:
$ hexdump -v -e '1/4 "%08x " "\n"'
Same, but as 32-bit decimal numbers:
$ hexdump -v -e '1/4 "%08d " "\n"'
Dump mouse raw motion data, three bytes per line, each as a hex number:
$ hexdump -v -e '3/1 "%02x " "\n"' /dev/input/mice
Like “hexdump -C”, only explicitly:
$ hexdump -e '"%08_ax " 16/1 "%02x "' -e '" |" 16/1 "%_p" "|\n"'
The manpage offers a lot more detail on this.
General
These are a few random notes to self regarding kernel compilation.
The preferred vanilla kernel rep to use is Linux Stable:
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/
It’s often a good idea to pick a kernel version that was released a while ago, but with a high sub-subversion number. So it has been tested properly and it also has several bug fixes that were discovered down the road.
See another post of mine for avoiding “+” being added to the kernel’s version number.
Targeting i386
I compiled a kernel on a x86_64 machine, targeting an i386. Kind-of cross-compilation, but with no need for a cross compiler.
Remember to update the (extra) version number in the Makefile, but with only with [-+.a-z0-9]+ characters, or else the will be trouble with creating .deb packages (see below). Don’t forget adding a dash (“-”) at the beginning of EXTRAVERSION, or else it will be glued to the SUBLEVEL.
Also remember that there’s always
$ make help
and it’s very useful.
After copying a known config file to .config:
$ make ARCH=i386 oldconfig
$ time make ARCH=i386 -j 12 bzImage modules && echo Success
A lazier version is to use the “olddefconfig” and then the “bindeb-pkg” make targets instead of those above — see below. The bonus is that everything gets neatly packaged, and most of the said next becomes unnecessary.
And as root (hey, the ARCH parameter wasn’t required!):
# make modules_install INSTALL_MOD_PATH=/path/to/
(this installs into /path/to/lib/modules/{version number}, so don’t write the “/lib/modules” part)
Remember to update the symbolic links to the source directory if necessary.
Note that it’s possible to set the kernel version directly from the make command, overriding the one given in the Makefile. For example, to match the currently running version:
$ make KERNELVERSION=`uname -r` ARCH=i386 -j 8 bzImage modules && echo Success
Be sure to check in include/generated/utsrelease.h, possibly while the kernel is compiling, that you got it right. In particular, there may be a “+” sign added.
A depmod was required on the running machine as follows (after booting with the kernel, without modules loaded), even though a depmod ran on modules_install:
# depmod -a
When hacking on the kernel sources, it can be useful to go something like
$ make ARCH=i386 SUBDIRS=drivers/pci/pcie/
in order to compile just a certain subdirectory (like “I didn’t do anything stupid, did I?”).
So nope. SUBDIRS is deprecated. Use the “M=” alternative for modules, even though SUBDIRS catches the built-in objects as well (in case they were played with too).
And it’s also possible to add the known targets, such as
$ make ARCH=i386 SUBDIRS=drivers/pci/pcie/ clean
for cleaning up before compiling etc.
Installing on a Debian-based distribution
That includes Ubuntu, Linux Mint and all distributions where “apt” and “dpkg” are used to manage packages.
Instead of fiddling with all the files, just create .deb packages and install them. It’s so easy, and the files get the right names and locations without any hassle.
The command is simply
$ time make bindeb-pkg && echo Success
after a successful kernel compilation. I’ve tried to do this along with the compilation (of v6.8.12), either by adding bindeb-pkg to the targets or by requesting this target only. In both cases, the build failed after a few minutes, and I had little motivation figuring out why. The only backside is that the compilation number of the used kernel becomes #2 instead of #1, but that’s really petty.
Note that if the kernel is assigned an EXTRAVERSION in the Makefile, it must not contain any uppercase characters nor underscores. In fact, it has to match the regular expression [-+.a-z0-9]* or else an illegal Debian package name will be created, and the finale of the build will fail.
It takes about 12-20 minutes, and it appears to be stuck on the way, but in the end the following files are created on the Linux kernel tree’s parent directory. For a 6.8.12-myserver kernel targeting amd64, these are the files created:
- linux-image-6.8.12-myserver_6.8.12-myserver-2_amd64.deb: The related files in /boot + modules in /lib/modules/
- linux-headers-6.8.12-myserver_6.8.12-myserver-2_amd64.deb: The headers for compiling modules
- linux-image-6.8.12-myserver-dbg_6.8.12-myserver-2_amd64.deb: Files apparently for debugging, a lot of them in /usr/lib/debug/lib/modules/
- linux-libc-dev_6.8.12-myserver-2_amd64.deb: Header files for compiling user-space interface with the kernel, under /usr/include/
Installing these first two packages with “dpkg -i” does what I consider having the kernel installed on the machine: The kernel image in /boot, the kernel modules and the headers. It’s really that simple.
Creating headers for module compilation (non-Debian machine)
This is the probably somewhat off-beat way to create the files for /usr/src/ so that kernel modules can be compiled against the running kernel. The idea is to create a .deb file for the binary of the kernel, which necessarily includes the headers, and then fetch the desired parts from that. Maybe there’s a more straightforward way, but I don’t do this often enough to look for it.
Ah, and “make headers_install” is not the answer. It install the headers used by user-space programs, not for compiling modules. Neither is “make modules_prepare”, which allows compiling the C sources against the kernel tree, but then the MODPOST stage in the compilation fails because Module.symvers is missing.
So start with creating the Debian package files with bindeb-pkg, as mentioned above.
To extract the .deb file that contains the compilation headers:
$ ar x linux-headers-5.15.0_5.15.0-1_amd64.deb
Yes, that’s “ar”, not “tar”. That produces three files, among others the data.tar.xz file. First, verify what it contains:
$ tar -tJf data.tar.xz | less
and once you’ve convinced yourself that it’s OK to untar this in the target’s root directory, become root, and go
# tar -C / -xJvf data.tar.xz
The -C flag causes a chdir to root before executing the command, right? Also note that this will update the “build” symlink in /lib/modules as well.
For a simple installation, do this for the linux-headers and linux-image packages.
Creating an initramfs file (when necessary)
For a non-running kernel, something like (needs to run as root, or it fails)
# update-initramfs -v -c -k 4.14.0-test -b .
update-initramfs: Generating ./initrd.img-4.14.0-test
(-v for verbose, not really necessary)
In theory, this should have worked as well, but it doesn’t enable the prompt for encrypted root filesystem, which is why I need the initramfs to begin with.
$ mkinitramfs -o initrd.img-4.14.0-test 4.14.0-test
Anyway, update-initramfs got me an 285 MB file, which doesn’t fit into the boot/ directory. The one that came with the distro was 28 MB.
On the other hand, if I do the same thing with the currently running kernel, the output gets small and neat. So maybe because the new kernel is much newer, and maybe because initramfs always copies a lot of modules, and not just in use, when it’s not from the running kernel.
It didn’t help bluffing mkinitramfs by renaming the directories in /lib/modules/ and run mkinitramfs as if it was on the current kernel. Exactly the same file size resulted.
So I opened the initramfs image manually (copying from myself), from within an empty directory with
$ zcat ../initrd.img-4.14.0-test | cpio -i -d -H newc --no-absolute-filenames
and looked for the large files. The directories lib/modules/4.14.0-test/kernel/drivers/{gpu,net,scsi} took ~620 MB together. So removing these three, navigating to the root of the initram filesystem and compressing it back again:
$ find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../smaller-initramfs.img
which shrunk the image to 94 MB. Which is small enough. The missing modules will load as the real root filesystem is mounted, so modules that aren’t necessary for boot can be deleted this way.
Trying to obtain a smaller initramfs image with
# update-initramfs -u -b .
when the new kernel is running brought me back a 285 MB image. It’s probably a matter of the new kernel’s size. It might be necessary to write a script that removes any module not loaded when the kernel is up from initramfs’ /lib/modules. But it’s not worth the effort at the moment.
Compiler versions
Since the target computer’s compiler version is really old, I got this when trying to compile a module on it:
$ make
make -C /lib/modules/5.15.0/build M=/home/eli/kernelmodule modules
make[1]: Entering directory '/usr/src/linux-headers-5.15.0'
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
You are using: gcc (Debian 4.9.2-10+deb8u2) 4.9.2
CC [M] /home/eli/kernelmodule/themodule.o
gcc: error: unrecognized command line option ‘-mrecord-mcount’
So I said, OK, let’s compile the kernel the target machine (or a root jail with a similar environment), but I got:
***
*** Compiler is too old.
*** Your GCC version: 4.9.2
*** Minimum GCC version: 5.1.0
***
So supporting old systems with new kernels isn’t all that easy. My solution is to compile the modules on the new machine, but nevertheless use the kernel headers just generated. They are still useful in the long run.
The slightly safer alternative is
$ rm --one-file-system -vrf delme-junk/
There are two additional flags:
- The -v flag causes “rm” to display the files as it deletes them. This gives the user a chance to stop the process if something completely wrong happens. Not as good as thinking before making the mistake, but much better than understanding the it in hindsight.
- The –one-file-system flag prevents deleting files from possibly mounted sub-filesystems. For example, deleting a directory tree that contains bind mounts (a chroot jail, for example), forgetting to unmount these beforehand.
This is no substitute to thinking before typing, of course. ;)
Also, renaming the directory to a name that clearly means it should be deleted is helpful too. In particular as the operation is stored in the command history, with the potential of being re-run accidentally. Even though one may exempt the command from being stored with a space as the first character in the command line, if bash is configured accordingly.
“Aw, snap” in Google Chrome happens when a process (or thread?) involved with Chrome dies unexpectedly. I got quite a few of those, and learned to live with them for about a year, as I couldn’t figure out what caused them. It was clear that it had to do with Adobe Flash somehow, and that it happened in certain sites, in certain situations. For example, Facebook’s messenger page always crashed. For those pages, I diverted to using the slower Firefox.
Adobe Flash continued to work fine in many other sites however.
This problem started after upgrading the kernel on Fedora Core 12 to a v3.12.20 I’ve compiled myself. Google Chrome is a 27.0.1453.93. All revisions can be upgraded in theory, but given all kinds of dependencies, the only way was to upgrade Linux completely. And I wasn’t ready to mess up a stable computer that does a lot of other things just to get rid of an annoying issue with Chrome.
For some reason, I couldn’t get a crash report from Chrome. I managed to enable reporting, but no report was ever generated.
The clue was there all along. These log entries kept appearing in /var/log/messages every time I launched Chrome:
Aug 7 13:01:15 kernel: audit_printk_skb: 16 callbacks suppressed
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.119:56738): ses=1 pid=15006 comm="chrome" sig=0 syscall=20 compat=1 ip=0xf2df8430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.297:56739): ses=1 pid=15030 comm="chrome" sig=0 syscall=5 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.297:56740): ses=1 pid=15030 comm="chrome" sig=0 syscall=33 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.297:56741): ses=1 pid=15030 comm="chrome" sig=0 syscall=5 compat=1 ip=0xf2d98044 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.297:56742): ses=1 pid=15030 comm="chrome" sig=0 syscall=85 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.298:56743): ses=1 pid=15030 comm="chrome" sig=0 syscall=195 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.298:56744): ses=1 pid=15030 comm="chrome" sig=0 syscall=195 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.298:56745): ses=1 pid=15030 comm="chrome" sig=0 syscall=195 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.298:56746): ses=1 pid=15030 comm="chrome" sig=0 syscall=195 compat=1 ip=0xf2d80430 code=0x50000
Aug 7 13:01:15 kernel: type=1326 audit(1438941675.298:56747): ses=1 pid=15030 comm="chrome" sig=0 syscall=195 compat=1 ip=0xf2d80430 code=0x50000
Googling a bit on “audit chrome type=1326″ I found this page making the connection with seccomp, and this page suggesting the solution in the comments.
Now, that made sense. Seccomp is a mechanism in Linux to cut off a process irreversibly from the outer world, so it can only read() and write() to already open file descriptors (supposedly going to pipes) or to terminate gracefully with exit(). Or use sigreturn(). It’s a neat security mechanism for not-so-trusted code that only needs to compute stuff. Codecs, for example. And Google Chrome uses this mechanism with Flash.
Maybe this explains why no crash report was generated: The process that crashed was jailed, so it couldn’t open the crash report file.
To fix this, I invoke Google Chrome with
$ google-chrome --disable-seccomp-filter-sandbox
And no more “Aw snaps”.
As the title implies, this solved my problem on a very certain machine with a very certain setting. There are millions of other reasons.
I wanted to create a boot USB stick from an Ubuntu 15.04 desktop that was running in a Virtual machine. So I plugged in a clean USB stick, and picked “Startup Disk Creator” from the Launcher. Then picked the USB stick as the target, and waited for 15 minutes. When the utility was done, it just disappeared, so coming back later to the computer there was no positive indication that all was well. But the old UNIX tradition is that all is well if there are no complaints.
Plugging the USB stick into the target computer I got
SYSLINUX 6.03 EDD 20150318 Copyright (C) 1994-2014 H. Peter Anvin et al
WARNING: No configuration file found
boot:
This seems to happen to people, and the advice differ. Here’s how I solved it.
I followed two advice on this page:
- I set up a 2 GB FAT16 partition on the USB stick, instead of the FAT32. The command for formatting was:
$ sudo mkdosfs -R 1 -F 16 /dev/sda1
Note that /dev/sda1 happened to point at the USB stick on my machine, but on yours it could be the “real” hard disk. So don’t just copy this!
Frankly, I don’t think this was the issue (doing this alone didn’t help)
- I copied the isolinux directory on the USB stick to syslinux (maybe renaming would be enough) and then renamed isolinux.bin to syslinux.bin, isolinux.cfg to syslinux.cfg. And that was it.
And then Ubuntu booted properly.
Just some jots as I tried to fix a Windows 8.1 laptop that didn’t boot (not mine, of course, I can’t stand Windows 8). It went “Preparing automatic repair” immediately on powerup, and then a light blue (not BSOD-blue) screen saying “Automatic Repair”, “Your PC did not start correctly”. Offering me to Restart or “Advanced Options”. This is where the saga begins.
Spoiler: Eventually, an accidental attempt brought things back to normal.
What installation is this?
I had some trouble telling if the installation was 32 or 64 bit. “systeminfo” didn’t work on command prompt, so I had to guess based upon the existing files. Be sure to look in C: and not in X: (are they the same?).
The ways I could tell it’s a 64-bit installation:
- The presence of C:\Program Files (x86) as well as C:\Program Files (only the latter is present in a 32-bit installation).
- The presence of C:\Windows\SysWOW64, which is intended for running 32-bit programs under a 64-bit OS.
Note that C:\Windows\System32 is present in both 32- and 64-bit versions.
Random ideas
Some things one might want to try out:
- Run Linux on a LiveUSB stick and run Applications > System Tools > Disk Utility (or similar path) to get some S.M.A.R.T. info from the hard disk.
- Run chkdsk /R on Windows’ command prompt to hopefully fix the disk issues
Create installation media on Window’s site (Google for it), as the self-fixing tools ask for it. The exact version of Windows is required for that, so “systeminfo” on command prompt should be helpful (if it says X86-based PC it’s 32 bit Windows, otherwise X64-based PC). In principle, looking at the computer’s properties is easier, but in recovery mode only command prompt is available.
- Choose “Refresh your PC” under “Troubleshoot” in the set of menus that appear when the computer fails to boot properly. The plug in the installation media when requested (don’t do it beforehand, as it won’t count). Not that it helped. I got “The media inserted is not valid. Try again.” So much for a descriptive error message, after preparing that silly USB stick for an hour or so.
So I went for booting the computer from USB, and got the “The drive where Windows is installed is locked. Unlock the drive and try again” error when trying to repair the OS. Following this page, I ran diskpart at command prompt and typed “list volume” which indeed printed out the disk partitions. This was done to make sure none appears as “RAW”, which would indicate that I don’t want to touch anything before the disk has been restored to a sane condition.
I also tried
bootrec /fixmbr
bootrec /fixboot
bootrec /rebuildbcd
but in vain. There was no difference.
sfc
It’s supposed to be the savior, isn’t it?
Looking for corrupted system files (System File Checker): “sfc /scannow” completed (verification 100% complete) but said that “Windows Resource Protection could not perform the requested operation”. The log file was found in X:\WINDOWS\LOGS\CBS\CBS.LOG (note the X:, it was put in the boot volume, not C:) with notepad (don’t forget to look for “All Files” and not just *.txt).
The reason it did nothing was that I didn’t run it as an Administrator. So going again, as an Administrator, I got “There is a system repair pending which requires reboot to complete. Restart Windows and run sfc again.” So I rebooted, and got the same message again. Very helpful. This page suggests looking what is pending in c:\windows\winsxs\pending.xml, and indeed such file existed. A long XML file, full with info about things that were about to happen.
Following this page I went for
dism.exe /image:C:\ /cleanup-image /revertpendingactions
which is an extremely annoying utility in that it claims that it doesn’t recognize the /cleanup-image nor /revertpendingactions options unless the line is typed exactly as above. Did I say something about helpful error messages?
So eventually it finished, and claimed to have done that successfully, but sfc still said there was pending system repair. Rebooting didn’t help. The pending.xml file was still there.
Trying dism again, it claimed having an error reverting an image, and sending me to a log file. Which, as one would expect, contained tons of rubbish and not much to go with. The reported error was 0x800f082f, which seems to be an undocumented error code. This post supplied a hack for working it around, but it wasn’t required in my case — the problem was the pending.xml file
As it previously complained about not having “sufficient scratch space” I also supplied the /scratchdir:c:\delme option the following time (with c:\delme being just an empty directory).
At this point I decided to rename pending.xml to was-pending.xml using notepad, which wasn’t all that simple (Remember to use the C: path, and not the X: default, remember to view all files and not just .txt, and note that changes are not updated in the GUI until I leave the directory and view it again. Things are weird when running in rescue mode). It would make more sense to use some File Manager, but “explorer” on command prompt wasn’t recognized.
sfc claimed not have completed the operation successfully (with a status_not_implemented error in the log). But that didn’t make any difference either.
Boot logging
This can be triggered off with F8 in theory, but probably only when things are relatively OK: Instead, at the Automatic Repair opening screen, go Advanced Options > Troubleshoot > Advanced Options > Startup Settings > Restart. So Windows reboots, but this time with a menu. Pick option 2, Enable boot logging. And boot, which fails again, of course with the same Automatic Repair Screen.
So go Advanced Options > Troubleshoot > Advanced Options > Command Prompt, enter as admin, and open type “notepad”. Well, the file was supposed to be there as C:\windows\ntbtlog.txt, but no such file was there. Microsoft has a rather useless possible explanation.
The breakthrough
Based upon the same menu for enabling Boot logging, I picked (8) “Disable early-launch anti-malware protection”. And after a little while, the computer was suddenly up and running!
From a running Windows position, I was offered to run System Restore to a known, recent configurations, and agreed.
The computer started munching and crunching, after which it restarted, and brought me back to the Automatic Repair screen. But now it mentioned a log file on this initial screen: C:\Windows\System32\Logfiles\Srt\SrtTrail.txt. And there it said: Boot critical file C:\Windows\system32\drivers\mfeelamk.sys is corrupt. And also that C:\tbs.sys is corrupt.
So turning off that malware option again, a successful boot was completed again, and again with a message that System Restore had failed.
Indeed there was no C:\tbs.sys file at all, but it was found on C:\Windows\system32\drivers\ with zero lenght. The mfeelamk.sys file turns out to be McAfee’s anti-malware driver, and let’s believe that it was problematic indeed. But that explains why turning off early malware check solved the issue.
So I renamed the tbs.sys so that Windows won’t find it (requires changing ownership and then permissions first) and ran sfc /scannow (from a running system this time, which is much slower). It ended up saying that it found some corrupt files, but was unable to fix some of them.
And then the computer booted up as usual. As simple as that.
So in hindsight, the problem was with one or two driver files which failed to load. Instead of saying that, Windows went “I’m sorry, you’re too stupid to be exposed to that information” and left me to guessing. And most people don’t complain. Just reinstall everything. Or even better, buy a new computer.
Lessons learned
- Always run chkdsk /r before trying to mess with the computer
- sfc /scannow in rescue mode is worthless
- Don’t try to be clever when fixing a Windows computer. The breakthrough step can’t be figured out logically on a senseless system. Just try things at random.
- Prepare a recovery USB stick, while all is fairly OK