Using cgroups to force RAM swapping for implementing an Arria 10 design

This post was written by eli on May 31, 2016
Posted Under: FPGA,Intel FPGA (Altera),Linux,Linux kernel

The problem

I needed to implement an FPGA design for an Arria 10 chip with Quartus 15 on a Linux machine. According to Altera’s requirement page, (“Memory recommendations” tab), the computer should have 28-48 GB of RAM. Or, as it says on that page, one can fake it with virtual memory. It turns out the the fitter (quartus_fit) is the process that requires this much memory.

Since I have a desktop with 16 GB and a laptop with 8 GB, I set up a large swap partition on the desktop (see below) and fired off the implementation. For a reason I can’t figure out, the memory just ran out, bringing the computer to a freeze after quartus_fit ate up GB after GB until it reached 15.7GB of used physical RAM: The kernel was still responsive (computer answered to pings) but it seemed like no process was able to run (for example, attempts to connect with ssh got no response whatsoever: The TCP link was established, but no data ran through it). After several minutes of looking at a completely frozen screen, and a hard disk doing almost nothing, I reset the computer.

As for the swap partition, only a few hundred MBs of it was used. Why pages weren’t rushed into swap to avoid this freezing is beyond me. This happened on the desktop running kernel v3.12.20 as well as the laptop with a 3.13.0-35 (Ubuntu 14.04.1).

The solution

Since the swapping mechanism didn’t kick in fast enough to prevent quartus_fit from eating up all physical RAM, let cgroups do the job instead. The idea is that one can limit the amount of physical memory used. Everything else goes to swap. Since I didn’t want to mess with my desktop again, I went for a 6 GB limit on my laptop (out of the existing 8 GB). Details follow.

Setting up swap

First thing first, set up a large swap partition. I’m using LVM on the machine, so it was quite easy.

In retrospective, 64 GB is much more than needed (10 GB would have been enough) but I was lucky enough to have this much spare room in the physical volume.

So to create a new logic volume, and format it for swap, it was (vg_main is the physical volume):

# lvcreate --size 64G vg_main -n lv_bigswap
# mkswap /dev/mapper/vg_main-lv_bigswap
# lvdisplay

Turn off old swap, and enable the new one only:

# swapoff -a
# swapon /dev/mapper/vg_main-lv_bigswap

And that’s it. The swap is enabled.

Cgroups

Now to the interesting part. First I needed to install the cgroup tools:

# apt-get install cgroup-bin

(there was no need to reboot, as suggested elsewhere)

Following this guide: Create a group, owned by myself (eli), but this has to be done as root:

# cgcreate -a eli:eli -g:memory:quartus

This creates the /sys/fs/cgroup/memory/quartus/ subdirectory, owned by user “eli” (and everything below too, so I don’t have to be root to control anything related to it).

Note that the name “quartus” is just a name and has nothing to do with the target executable. Which is never “quartus” in my case, because I implement the project by kicking off “make” from “xemacs”.

I could have used cgexec to start a new process, for example (as root, because changing a group isn’t allowed as plain user)

# cgexec -g memory:quartus xemacs

but I went for changing the group for an existing process (root required, again. 4550 happens to be the PID of xemacs):

# cgclassify -g memory:quartus 4550

Now drop the root privileges. They won’t be required anymore.

And indeed, the process has joined the group (as non-root):

$ cat /sys/fs/cgroup/memory/quartus/cgroup.procs
4550

It could also make sense to target a shell process, which would limit anything executed from it. For example, to add the running shell to the group:

$ sudo cgclassify -g memory:vivado $$

Set the memory limit to 6 GiB:

$ cat /sys/fs/cgroup/memory/quartus/memory.limit_in_bytes
18446744073709551615
$ echo 6442450944 > /sys/fs/cgroup/memory/quartus/memory.limit_in_bytes
$ cat /sys/fs/cgroup/memory/quartus/memory.limit_in_bytes
6442450944

And now launch the implementation from that xemacs process (the “Compile”) button.

For the amusement, follow the joining processes with

$ watch cat /sys/fs/cgroup/memory/quartus/cgroup.procs

Needless to say(?) any process that forks from the originating process joins the group automatically, so the limit applies to all processes. And indeed, when the memory use reaches 6 GB, it goes to swap.

This made the whole process considerably slower I suppose (CPU usage went down to almost zero for some periods of time waiting for disk I/O), but it took some 35 minutes to finish a simple implementation, which is all I needed.

Add a Comment

required, use real name
required, will not be published
optional, your blog address