fsck errors after shrinking an unmounted ext4 with resize2fs
Motivation
I’m using resize2fs a lot to when backing up into a USB stick. The procedure is to create an image of an encrypted ext4 file system, and raw write it into the USB flash device. To save time writing to the USB stick the image is shrunk to its minimal size with resize2fs -M.
Uh-oh
This has been working great for years with my oldie resize2fs 1.41.9, but after upgrading my computer (Linux Mint 19), and starting to use 1.44.1, things began to go wrong:
# e2fsck -f /dev/mapper/temporary_18395 Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/mapper/temporary_18395: 1078201/7815168 files (0.1% non-contiguous), 27434779/31249871 blocks # resize2fs -M -p /dev/mapper/temporary_18395
resize2fs 1.44.1 (24-Mar-2018) Resizing the filesystem on /dev/mapper/temporary_18395 to 27999634 (4k) blocks. Begin pass 2 (max = 1280208) Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin pass 3 (max = 954) Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin pass 4 (max = 89142) Updating inode references XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX The filesystem on /dev/mapper/temporary_18395 is now 27999634 (4k) blocks long. # e2fsck -f /dev/mapper/temporary_18395 e2fsck 1.44.1 (24-Mar-2018) Pass 1: Checking inodes, blocks, and sizes Inode 85354 extent block passes checks, but checksum does not match extent (logical block 237568, physical block 11929600, len 24454) Fix<y>? yes Inode 85942 extent block passes checks, but checksum does not match extent (logical block 129024, physical block 12890112, len 7954) Fix<y>? yes Inode 117693 extent block passes checks, but checksum does not match extent (logical block 53248, physical block 391168, len 8310) Fix<y>? yes Inode 122577 extent block passes checks, but checksum does not match extent (logical block 61440, physical block 399478, len 607) Fix<y>? yes Inode 129597 extent block passes checks, but checksum does not match extent (logical block 409600, physical block 14016512, len 12918) Fix<y>? yes Inode 129599 extent block passes checks, but checksum does not match extent (logical block 274432, physical block 13640964, len 1570) Fix<y>? yes Inode 129600 extent block passes checks, but checksum does not match extent (logical block 120832, physical block 14653440, len 13287) Fix<y>? yes Inode 129606 extent block passes checks, but checksum does not match extent (logical block 133120, physical block 14870528, len 16556) Fix<y>? yes Inode 129613 extent block passes checks, but checksum does not match extent (logical block 75776, physical block 15054848, len 23962) Fix<y>? yes Inode 129617 extent block passes checks, but checksum does not match extent (logical block 284672, physical block 15716352, len 7504) Fix ('a' enables 'yes' to all) <y>? yes Inode 129622 extent block passes checks, but checksum does not match extent (logical block 86016, physical block 15532032, len 18477) Fix ('a' enables 'yes' to all) <y>? yes Inode 129626 extent block passes checks, but checksum does not match extent (logical block 145408, physical block 16967680, len 5536) Fix ('a' enables 'yes' to all) <y>? yes Inode 129630 extent block passes checks, but checksum does not match extent (logical block 165888, physical block 17125376, len 29036) Fix ('a' enables 'yes' to all) <y>? yes Inode 129677 extent block passes checks, but checksum does not match extent (logical block 126976, physical block 17100800, len 24239) Fix<y>? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/mapper/temporary_18395: 1078201/7004160 files (0.1% non-contiguous), 27383882/27999634 blocks
Not the end of the world
This bug has been reported and fixed. Judging by the change made, it was only about the checksums, so while the bug caused fsck to detect (and properly fix) errors, there’s no loss of data (I encountered the same problem when shrinking a 5.7 TB partition by 40 GB — fsck errors, but I checked every single file, a total of ~3 TB, and all was fine).
I beg to differ on the commit message saying it’s a “relatively rare case” as it happened to me every single time in two completely different settings, none of which were special in any way. However we all use journaled filesystems, so fsck checks have become rare, which can explain how this has gone unnoticed: Unless resize2fs officially failed somehow, it leaves the filesystem marked as clean. Only “e2fsck -f ” will reveal the problem.
I would speculate that the reason for this bug is this commit (end of 2014), which speeds up the checksum rewrite after moving an inode. It’s somewhat worrying that a program of this sensitive type isn’t tested properly before being released for everyone’s use.
My own remedy was to compile an updated revision (1.44.4) from the repository, commit ID 75da66777937dc16629e4aea0b436e4cffaa866e. Actually, I first tried to revert to resize2fs 1.41.9, but that one failed shrinking a 128 GB filesystem with only 8 GB left, saying it had run out of space.
Conclusion
It’s almost 2019, the word is that shrinking an ext4 filesystem is dangerous, and guess what, it’s probably a bit true. One could wish it wasn’t, but unfortunately the utilities don’t seem to be maintained with the level of care that one could hope for, given the damage they can make.