Making a snapstot of a full Ubuntu / Mint repository on the local disk

This post was written by eli on July 27, 2019
Posted Under: Linux,Software

What’s that good for?

This isn’t about maintaining a local repository that mirrors its original, following along with its changes. The idea is to avoid the upgrades of a lot of packages every time I want to install a new one with apt. Maybe I should mention that I don’t allow automatic upgrades on my machine? Exactly like I don’t leave my car keys to the mechanic, so he can make any fixes he considers would make my car better. Every day.

Before I get into the technical details, I’ll have my say on the culture of upgrading everything all the time. Just in case someone with influence on the matter reads this. Or maybe someone ready to maintain a non-updating mirror…?

The way packaging is made today is that each package requires the latest-latest dependencies just because they happened to exist, not because they’re needed. I mean, forcing an unnecessary upgrade of other packages is fine, because how could upgrading be wrong? Or go wrong?

But upgrading is good…?

Most people believe upgrading software is generally good. Personally, I don’t. Every now and then, an upgrade breaks something that worked, and even seemingly harmless upgrades of minor pieces of software can force me into a session of debugging my system. It may very well be that the upgrade rectified something that was wrong before. But this way or another, my computer worked before, and after the upgrade it didn’t. As has already been said:

If it ain’t broke, don’t fix it.

Upgrades sometimes involve security fixes. Staying with old versions is considered neglecting your security. This may be true when the computer is a server or a multiuser machine, and strangers are allowed to do this and that with it. However when it comes to single-user desktops that are properly set up (plus a firewall), the risk for an upfront security exploit is rather minimal. I always ask people when they last heard about a personal Linux desktop being compromised by virtue of a vulnerability, and nobody can come up with such case.

I’ve also discussed this issue with several guys who are responsible for major Linux systems, for which a downtime means real damage, and stability is important. The typical conversation contains an apology for using old distributions and old software, and them reassuring me that they understand that upgrading is important. It’s just that in their specific case they have to stick to a certain, old Linux distribution to keep the system running continuously.

So it’s a risk management question: The risk of having the computer messed up by an upgrade (with probability converging to 1 as time increases) vs the probability of desktop being attacked (probability not known, as no such event is known to me). Given that I fix significant security issues by other means, as they occur.

So my decision is clear, and here’s how to do it.

apt-mirror

All said below relates to Linux Mint 19, but most likely applies to a wide range of Debian-based distros.

apt-mirror is a cleverly written Perl script that mirrors selected Debian repositories into the local disk. It’s one of those utilities that simply do the job with the practical, real-life details taken care of correctly. In the proper Perl spirit, in short.

In essence:

  • Install apt-mirror with a plain apt command.
  • Change the ownership the directory to which the packages go (given as base_path in the config file) to apt-mirror.
  • Don’t set up the cron job, as we’re not into having it updated (possibly delete /etc/cron.d/apt-mirror).

Set up mirror.list

The default /etc/apt/mirror.list is generally fine, with nthreads set to 20 by default, which is OK.

You may want to set base_path in /etc/apt/mirror.list to something else than the default.

Then copy all repositories listed in /etc/apt/sources.list.d into mirror.list. This is just copying the lines beginning with “deb” as is.

Well, probably not. If you’re running on a 64-bit machine (is there anyone not?), set /etc/apt/mirror.list to download packages for amd64 and i386. This will grow the disk consumption from 140 GB to 193 GB (YMMV), but sometimes these i386 packages are handy.

For this to work, each line must appear twice. So if the original “deb” line said

deb http://packages.linuxmint.com tara main upstream import backport

these two should appear in mirror.list:

deb-i386 http://packages.linuxmint.com tara main upstream import backport
deb-amd64 http://packages.linuxmint.com tara main upstream import backport

Otherwise apt-mirror downloads only the packages for the current arch. One can also add a deb-src for the mirror repository as well, if desired.

Running apt-mirror

Run apt-mirror as root with

# su - apt-mirror -c apt-mirror

The cool part with running it this way is that if you go CTRL-C (you want to do that, apt-mirror runs forever on the first attempt, downloading ~200 GiB or so), the child processes (a lot of wgets) are killed gracefully as well.

How it works: First it does some crunching of the repositories’ metadata. After not too long time, it generates 20 processes, each for downloading a list of URLs:

wget --no-cache --limit-rate=100m -t 5 -r -N -l inf -o /opt/apt-mirror/var/archive-log.0 -i /opt/apt-mirror/var/archive-urls.0

all of which run in /opt/apt-mirror/mirror, which is the target for the files.

and then it just waits. The output shown on console (like “[20]…”) is the number of processes still running.

Configure apt to use local repositories only

First of all, move the “mirror” subdirectory away to some other place, so it’s out of sight for apt-mirror. No more updates. For example, into /var/local-apt-repo. I also suggest changing its owner to root at this stage:

# chown -R root:root local-apt-repo/

So a line saying

deb http://packages.linuxmint.com tara main upstream import backport

changes into (if the repository is kept in /var/local-apt-repo/)

deb file:///var/local-apt-repo/packages.linuxmint.com tara main upstream import backport

and make apt aware of the change:

# apt clean
# apt update

Verify that only local files are accessed (it prints out the paths) and that there are no errors. Those opting out downloading the i386 repositories as well will get a lot of error messages at the end, like

E: Failed to fetch file:/var/local-apt-repo/packages.linuxmint.com/dists/tara/main/binary-i386/Packages  File not found - /var/local-apt-repo/packages.linuxmint.com/dists/tara/main/binary-i386/Packages (2: No such file or directory)

I suppose it’s harmless to the end that there will be no i386 packages to work with, but I don’t really know, as I went for downloading packages for both archs.

And then comes the last session of upgrades. At a convenient time for tackling possible upgrade side effects, go

# apt list --upgradable

and then try to upgrade packages in small chunks, so that the changes each make can be tracked (in particular if you have a git repo on /etc, like myself), with

# apt install --only-upgrade package-name

For convenience, package-name may include wildcards with * (use single quotes or escape it with backslash, i.e. \*).

After all this is done, fix whatever broke because of all upgrades. In my case I lost graphics acceleration on my NVidia card, solved by manually reinstalling the drivers as originally downloaded from the vendor. Just to remind me why I’m doing all this.

If apt-file is also installed (it’s a good idea to have it), this is also a good time to go

# apt-file update

How and why the local repo is self-contained

It’s not worth much to take a snapshot that can’t be relied upon forever. The fact that the download process typically takes a few days, most likely with several interruptions in the middle, doesn’t contribute to the feeling of reassurance. I ran my downloads on nights only, for example (hey, I want a decent internet connection during the day).

This is solved in a surprisingly simple manner: The Packages files contain the list of files required for constituting a self-contained repository. apt-mirror first downloads these files, then it downloads the files it requires, and then uploads the Packages files in the mirror. At all times, all required files are in place.

This is why it’s safe to stop apt-mirror in the middle: Even though the running wget processes will leave some files half-downloaded, they will be fixed on the next run: apt-mirror compares the size of the file on disk with the size declared in the respective Packages file (in its need_update() function). So all files in the Packages files must exist and be of the correct size. This is apt-mirror’s view of a file being in place.

One could also compare the SHA sums of each file in the entire repo. I haven’t found such utility, and I’m not sure it’s worth the effort.

There’s somewhat reassuring to run apt-mirror after its completion, and see that it downloads nothing. It seems like that doesn’t happen. I ended up downloading one archive file of 596 MiB each time (or so apt-mirror said), but then going

$ find /opt/apt-mirror/ -iname \*.deb -cmin -3

found no files. So this was probably only metadata loaded (indeed, dropping the *.deb requirement listed a lot of files).

Reducing wasted disk space

A side effect of the way apt-mirror works, is that outdated packages remain in the repository: When apt-mirror is re-run, it makes sure that all files in the current Packages files are downloaded. When a package is updated, a new package file is enlisted, and the old one just vanishes from the Packages file. But apt-mirror doesn’t delete it, as it’s in the process of updating the repository. The old Packages file is still in effect.

Also, in a real-life mirror scenario, someone could be in the middle of an installation which is based upon several files. So the unnecessary files can only be deleted after the Packages files have been updated (i.e. when apt-mirror finishes) plus the maximal time one could imagine an installation to take. Actually, in a continuously updating web mirror situation, removing a package file will break things for end-users until they run “apt update”. So a real mirror with happy end users should probably not delete files all that often.

Anyhow, apt-mirror creates a clean.sh script in the var/ subdirectory, which deletes all files that aren’t required by the current set of Packages files. It should be executed to get rid of those, when it’s good time. Note that the script changes directory to the absolute path to which it downloaded the mirror (so watch out if you’re moving that directory eventually).

# su - apt-mirror -c /opt/apt-mirror/var/clean.sh

For this script to be generated, add “clean” lines in mirror.list, like

clean http://packages.linuxmint.com

If there are several “deb” lines for the same host, one “clean” line like the above covers them all.

Another waste of disk space is that security.ubuntu.com contains a lot of packages that are already in other repositories. As this entire repo takes 40 GB (30 GB for amd64 alone), it’s unfortunate. One possibility would be to write a script than scans the directories for identical files (based upon SHA sums) and removes one file, replacing it with a symbolic link. Or maybe get this info from the Packages file. Or, like I did, not bother at all.

Reader Comments

I’m not sure it’s still worth the effort but I really like the way you describe the installation process!

#1 
Written By professional proofreading service on July 28th, 2019 @ 19:16

Add a Comment

required, use real name
required, will not be published
optional, your blog address