<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>my tech blog &#187; Linux</title>
	<atom:link href="http://billauer.co.il/blog/category/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://billauer.co.il/blog</link>
	<description>Anything I found worthy to write down.</description>
	<lastBuildDate>Sun, 19 Sep 2021 10:43:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Using firejail to throttle network bandwidth for wget and such</title>
		<link>http://billauer.co.il/blog/2021/08/firejail-network-bandwidth-limit/</link>
		<comments>http://billauer.co.il/blog/2021/08/firejail-network-bandwidth-limit/#comments</comments>
		<pubDate>Sun, 15 Aug 2021 13:39:17 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Server admin]]></category>
		<category><![CDATA[Virtualization]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6382</guid>
		<description><![CDATA[Introduction Occasionally, I download / upload huge files, and it kills my internet connection for plain browsing. I don&#8217;t want to halt the download or suspend it, but merely calm it down a bit, temporarily, for doing other stuff. And then let it hog as much as it want again. There are many ways to [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>Occasionally, I download / upload huge files, and it kills my internet connection for plain browsing. I don&#8217;t want to halt the download or suspend it, but merely calm it down a bit, temporarily, for doing other stuff. And then let it hog as much as it want again.</p>
<p>There are many ways to do this, and I went for firejail. I suggest reading <a title="Firejail: Putting a program in its own little container" href="http://billauer.co.il/blog/2020/06/firejail-cgroups/" target="_blank">this post of mine</a> as well on this tool.</p>
<p>Firejail gives you a shell prompt, which runs inside a mini-container, like those cheap virtual hosting services. Then run wget or youtube-dl as you wish from that shell.</p>
<p>It has practically access to everything on the computer, but the network interface is controlled. Since firejail is based on cgroups, all processes and subprocesses are collectively subject to the network bandwidth limit.</p>
<p>Using firejail requires setting up a bridge network interface. This is a bit of container hocus-pocus, and is necessary to get control  over the network data flow. But it&#8217;s simple, and it can be done once  (until the next reboot, unless the bridge is configured permanently,  something I don&#8217;t bother).</p>
<h3>Setting up a bridge interface</h3>
<p>Remember: Do this once, and just don&#8217;t remove the interface when done with it.</p>
<p>You might need to</p>
<pre># <strong>apt install bridge-utils</strong></pre>
<p>So first, set up a new bridge device (as root):</p>
<pre># <strong>brctl addbr hog0</strong></pre>
<p>and give it an IP address that doesn&#8217;t collide with anything else on the system. Otherwise, it really doesn&#8217;t matter which:</p>
<pre># <strong>ifconfig hog0 10.22.1.1/24</strong></pre>
<p>What&#8217;s going to happen is that there will be a network interface named eth0 inside the container, which will behave as if it was connected to a real Ethernet card named hog0 on the computer. Hence the container has access to everything that is covered by the routing table (by means of IP forwarding), and is also subject to the firewall rules. With my specific firewall setting, it prevents some access, but ppp0 isn&#8217;t blocked, so who cares.</p>
<p>To remove the bridge (no real reason to do it):</p>
<pre># <strong>brctl delbr hog0</strong></pre>
<h3>Running the container</h3>
<p>Launch a shell with firejail (I called it &#8220;nethog&#8221; in this example):</p>
<pre>$ <strong>firejail --net=hog0 --noprofile --name=nethog</strong></pre>
<p>This starts a new shell, for which the bandwidth limit is applied. Run wget or whatever from here.</p>
<p>Note that despite the &#8211;noprofile flag, there are still some directories that are read-only and some are temporary as well. It&#8217;s done in a sensible way, though so odds are that it won&#8217;t cause any issues. Running &#8220;df&#8221; inside the container gives an idea on what is mounted how, and it&#8217;s scarier than the actual situation.</p>
<p>But <strong>be sure to check that the files that are downloaded are visible outside the container</strong>.</p>
<p>From another shell prompt, <strong>outside the container</strong> go something like (<strong>doesn&#8217;t </strong>require root):</p>
<pre>$ <strong>firejail --bandwidth=nethog set hog0 800 75</strong>
Removing bandwith limit
Configuring interface eth0
Download speed  6400kbps
Upload speed  600kbps
cleaning limits
configuring tc ingress
configuring tc egress
</pre>
<p>To drop the bandwidth limit:</p>
<pre>$ <strong>firejail --bandwidth=nethog clear hog0</strong></pre>
<p>And get the status (saying, among others, how many packets have been dropped):</p>
<pre>$ <strong>firejail --bandwidth=nethog status</strong></pre>
<p>Notes:</p>
<ul>
<li>The &#8220;eth0&#8243; mentioned in firejail&#8217;s output blob relates to the interface name <strong>inside</strong> the container. So the &#8220;real&#8221; eth0 remains untouched.</li>
<li>Actual download speed is slightly slower.</li>
<li>The existing group can be joined by new processes with firejail &#8211;join, as well as from firetools.</li>
<li>Several containers may use the same bridge (hog0 in the example  above), in which case each has its own independent bandwidth setting.  Note that the commands configuring the bandwidth limits mention both the  container&#8217;s name and the bridge.</li>
</ul>
<h3>Working with browsers</h3>
<p>When starting a browser from within a container, pay attention to  whether it really started a new process. Using firetools can help.</p>
<p>If  Google Chrome says &#8220;Created new window in existing browser session&#8221;, it <strong>didn&#8217;t</strong> start a new process inside the container, in which case the window isn&#8217;t subject to bandwidth limitation.</p>
<p>So close all windows of Chrome before kicking off a new one. Alternatively, this can we worked around by starting the container with.</p>
<pre>$ firejail --net=hog0 --noprofile <strong>--private</strong> --name=nethog</pre>
<p>The &#8211;private flags creates, among others, a new <strong>volatile</strong> home directory, so Chrome doesn&#8217;t detect that it&#8217;s already running. Because I use some other disk mounts for the large partitions on my computer, it&#8217;s still possible to download stuff to them from within the container.</p>
<p>But extra care is required with this, and regardless, the new browser doesn&#8217;t remember passwords and such from the private container.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2021/08/firejail-network-bandwidth-limit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Octave: Creating images from plots for web page</title>
		<link>http://billauer.co.il/blog/2021/08/octave-export-plot-to-png/</link>
		<comments>http://billauer.co.il/blog/2021/08/octave-export-plot-to-png/#comments</comments>
		<pubDate>Thu, 12 Aug 2021 16:07:00 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6379</guid>
		<description><![CDATA[This should have been a trivial task, but it turned out quite difficult. So these are my notes for the next time. Octave 4.2.2 under Linux Mint 19, using qt5ct plugin with GNU plot (or else I get blank plots). So this is the small function I wrote for creating a plot and a thumbnail: [...]]]></description>
			<content:encoded><![CDATA[<p>This should have been a trivial task, but it turned out quite difficult. So these are my notes for the next time. Octave 4.2.2 under Linux Mint 19, using qt5ct plugin with GNU plot (or else I get <a href="http://billauer.co.il/blog/2019/12/octave-linux-plots/" target="_blank">blank plots</a>).</p>
<p>So this is the small function I wrote for creating a plot and a thumbnail:</p>
<pre>function []=toimg(fname, alt)

grid on;

saveas(gcf, sprintf('%s.png', fname), 'png');
print(gcf, sprintf('%s_thumb.png', fname), '-dpng', '-color', '-S280,210');

disp(sprintf('&lt;a href="/media/%s.png" target="_blank"&gt;&lt;img alt="%s" src="/media/%s_thumb.png" style="width: 280px; height: 210px;"&gt;&lt;/a&gt;', fname, alt, fname));</pre>
<p>The @alt argument becomes the image&#8217;s alternative text when shown on the web page.</p>
<p>The call to saveas() creates a 1200x900 image, and the print() call creates a 280x210 one (as specified directly). I take it that print() will create a 1200x900 without any specific argument for the size, but I left both methods, since this is how I ended up after struggling, and it&#8217;s better to have both possibilities shown.</p>
<p>To add some extra annoyment, toimg() always plots the current figure, which is typically the last figure plotted. Which is not necessarily the figure that has focus. As a matter of fact, even if the current figure is closed by clicking the upper-right X, it remains the current figure. Calling toimg() will make it reappear and get plotted. Which is really weird behavior.</p>
<p>The apparently only way around this is to use figure() to select the desired current figure before calling ioimg(), e.g.</p>
<pre>&gt;&gt; figure(4);</pre>
<p>The good news is that the figure numbers match those appearing on the windows&#8217; titles. This also explains why the numbering doesn&#8217;t reset when closing all figure windows manually. To really clear all figures, go</p>
<pre>&gt;&gt; close all hidden</pre>
<h3>Other oddities</h3>
<ul>
<li>ginput() simply doesn&#8217;t work. The workaround is to double-click any point (with left button) and the coordinates of this point are copied into the clipboard. Paste it anywhere. Odd, but not all that bad.</li>
<li>Zooming in with right-click and then left-click doesn&#8217;t affect axis(). As a result, saving the plot as an image is not affected by this zoom feature. Wonky workaround: Use the double-click trick above to obtain the coordinates of relevant corners, and use axis() to set them properly. Bonus: One gets the chance to adjust the figures for a sleek plot. If anyone knows how to save a plot as it&#8217;s shown by zooming, please comment below.</li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2021/08/octave-export-plot-to-png/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When dovecot silently stops to deliver mails</title>
		<link>http://billauer.co.il/blog/2021/07/dovecot-fetchmail-pop3-stuck/</link>
		<comments>http://billauer.co.il/blog/2021/07/dovecot-fetchmail-pop3-stuck/#comments</comments>
		<pubDate>Fri, 23 Jul 2021 09:30:14 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[email]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Server admin]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6369</guid>
		<description><![CDATA[After a few days being happy with not getting spam, I started to suspect that something is completely wrong with receiving mail. As I&#8217;m using fetchmail to get mail from my own server running dovecot v2.2.13, I&#8217;m used to getting notifications when fetchmail is unhappy. But there was no such. Checking up the server&#8217;s logs, [...]]]></description>
			<content:encoded><![CDATA[<p>After a few days being happy with not getting spam, I started to suspect that something is completely wrong with receiving mail. As I&#8217;m using fetchmail to get mail from my own server running dovecot v2.2.13, I&#8217;m used to getting notifications when fetchmail is unhappy. But there was no such.</p>
<p>Checking up the server&#8217;s logs, there were tons of these messages:</p>
<pre>dovecot: master: Warning: service(pop3-login): process_limit (100) reached, client connections are being dropped</pre>
<p>Restarting dovecot got it back running properly again, and I got a flood of the mails that were pending on the server. This was exceptionally nasty, because mails stopped arriving silently.</p>
<p>So what was the problem? The clue is in these log messages, which occurred about a minute after the system&#8217;s boot (it&#8217;s a VPS virtual machine):</p>
<pre>Jul 13 11:21:46 dovecot: master: Error: service(anvil): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(log): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(ssl-params): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(log): child 1210 killed with signal 9</pre>
<p>These three services are helper processes for dovecot, as can be seen in the output of systemctl status:</p>
<pre>            ├─dovecot.service
             │ ├─11690 /usr/sbin/dovecot -F
             │ ├─11693 dovecot/anvil
             │ ├─11694 dovecot/log
             │ ├─26494 dovecot/config
             │ ├─26495 dovecot/auth
             │ └─26530 dovecot/auth -w</pre>
<p>What seems to have happened is that these processes failed to launch properly within the 30 second timeout limit, and were therefore killed by dovecot. And then attempts to make pop3 connections seem to have got stuck, with the forked processes that are made for each connection remaining. Eventually, they reached the maximum of 100.</p>
<p>The reason this happened only now is probably that the hosting server had some technical failure and was brought down for maintenance. When it went up again, all VMs were booted at the same time, so they were all very slow in the beginning. Hence it took exceptionally long to kick off those helper processes. The 30 seconds timeout kicked in.</p>
<p>The solution? Restart dovecot once in 24 hours with a plain cronjob. Ugly, but works. In the worst case, mail will be delayed for 24 hours. This is a very rare event to begin with.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2021/07/dovecot-fetchmail-pop3-stuck/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running Xilinx Impact on Linux Mint 19</title>
		<link>http://billauer.co.il/blog/2021/02/xilinx-impact-linux/</link>
		<comments>http://billauer.co.il/blog/2021/02/xilinx-impact-linux/#comments</comments>
		<pubDate>Sat, 13 Feb 2021 11:46:51 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[udev]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6239</guid>
		<description><![CDATA[Introduction This is my short war story as I made Xilinx&#8217; Impact, part of ISE 14.7, work on a Linux Mint 19 machine with a v4.15 Linux kernel. I should mention that I already use Vivado on the same machine, so the whole JTAG programming thing was already sorted out, including loading firmware into the [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>This is my short war story as I made Xilinx&#8217; Impact, part of ISE 14.7, work on a Linux Mint 19 machine with a v4.15 Linux kernel. I should mention that I already use Vivado on the same machine, so the whole JTAG programming thing was already sorted out, including loading firmware into the USB JTAG adapters, whether it&#8217;s a platform cable or an on-board interface. All that was already history. It was Impact that refused to play ball.</p>
<p>In short, what needed to be done:</p>
<ul>
<li>Make a symbolic link to activate libusb.</li>
<li>Make sure that the firmware files are installed, even if they&#8217;re not necessary to load the USB devices.</li>
<li>Make sure Vivado&#8217;s hardware manager isn&#8217;t running.</li>
<li>Don&#8217;t expect it to always work. It&#8217;s JTAG through USB, which is well-known for being cursed since ancient times. Sometimes &#8220;Initialize Chain&#8221; works right away, sometimes &#8220;Cable Auto Connect&#8221; is needed to warm it up, and sometimes just restart Impact, unplug and replug everything + recycle power on relevant card. Also apply spider leg powder as necessary with grounded elephant eyeball extract.</li>
</ul>
<p>And now in painstaking detail.</p>
<h3>The USB interface</h3>
<p>The initial attempt to talk with the USB JTAG interface failed with a lot of dialog boxes saying something about windrvr6 and this:</p>
<pre>PROGRESS_START - Starting Operation.
If you are using the Platform Cable USB, please refer to the USB Cable Installation Guide (UG344) to install the libusb package.
Connecting to cable (Usb Port - USB21).
Checking cable driver.
 Linux release = 4.15.0-20-generic.
WARNING:iMPACT -  Module windrvr6 is not loaded. Please reinstall the cable drivers. See Answer Record 22648.
Cable connection failed.</pre>
<p>This is horribly misleading. windrvr6 is a Jungo driver which isn&#8217;t supported for anything by ancient kernels. Also, the said Answer Record seems to have been deleted.</p>
<p>Luckily, there&#8217;s a libusb interface as well, but it needs to be enabled. More precisely, Impact needs to find a libusb.so file somewhere. Even more precisely, this is some strace output related to its attempts:</p>
<pre>openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE//lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE/lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/ISE/sysgen/lib/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/EDK/lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/xilinx/14.7/ISE_DS/common/lib/lin64/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
<em><span style="color: #888888;">[ ... ]</span></em>
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libusb.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)</pre>
<p>It so happens that a libusb module is present among the files installed along with ISE (several times, actually), so it&#8217;s enough to just</p>
<pre>$ cd /opt/xilinx/14.7/ISE_DS/ISE/lib/lin64/
$ ln -s libusb-1.0.so.0 libusb.so</pre>
<p>or alternatively, a symlink to /usr/lib/x86_64-linux-gnu/libusb-1.0.so worked equivalently well on my system.</p>
<h3>But then</h3>
<p>Trying to initialize the chain I got:</p>
<pre>PROGRESS_START - Starting Operation.
Connecting to cable (Usb Port - USB21).
Checking cable driver.
File version of /opt/xilinx/14.7/ISE_DS/ISE/bin/lin64/xusbdfwu.hex = 1030.
 <strong>Using libusb</strong>.
<strong>Please run `source ./setup_pcusb` from the /opt/xilinx/14.7/ISE_DS/ISE//bin/lin64 directory with root privilege to update the firmware. Disconnect and then reconnect the cable from the USB port to complete the driver update.
</strong>Cable connection failed.</pre>
<p>So yey, it was not going for libusb. But then it refused to go on.</p>
<p>Frankly speaking, I&#8217;m not so much into running any script with root privileges, knowing it can mess up things with the working Vivado installation. On my system, there was actually no need, because I had already installed and then removed the cable drivers (as required by ISE).</p>
<p>What happened here was that Impact looked for firmware files somewhere in /etc/hotplug/usb/, assuming that if they didn&#8217;t exist, then the USB device must not be loaded with firmware. But it was in my case. And yet, Impact refused on the grounds that the files couldn&#8217;t be found.</p>
<p>So I put those files back in place, and Impact was happy again. If you don&#8217;t have these files, an ISE Lab Tools installation should do the trick. Note that it also installs udev rules, which is what I wanted to avoid. And also that the installation will fail, because it includes compiling the Jungo driver against the kernel, and there&#8217;s some issue with that. But as far as I recall, the kernel thing is attempted last, so the firmware files will be in place. I think.</p>
<p>Or installing them on behalf of Vivado is also fine? Note sure.</p>
<h3>Identify failed</h3>
<p>Attempting to Cable Auto Connect, I got Identify Failed and a whole range of weird errors. Since I ran Impact from a console, I got stuff like this on the terminal:</p>
<pre>ERROR set configuration. strerr=Device or resource busy.
ERROR claiming interface.
ERROR setting interface.
ERROR claiming interface in bulk transfer.
bulk tranfer failed, endpoint=02.
ERROR releasing interface in bulk transfer.
ERROR set configuration. strerr=<strong>Device or resource busy</strong>.
ERROR claiming interface.
ERROR setting interface.
control tranfer failed.
control tranfer failed.</pre>
<p>This time it was a stupid mistake: Vivado&#8217;s hardware manager ran at the same time, so the two competed. Device or resource busy or not?</p>
<p>So I just turned off Vivado. And voila. All ran just nicely.</p>
<h3>Bonus: Firmware loading confusion</h3>
<p>I mentioned that I already had the firmware loading properly set up. So it looked like this in the logs:</p>
<pre>Feb 13 11:58:18 kernel: usb 1-5.1.1: new high-speed USB device number 78 using xhci_hcd
Feb 13 11:58:18 kernel: usb 1-5.1.1: New USB device found, idVendor=03fd, idProduct=000d
Feb 13 11:58:18 kernel: usb 1-5.1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
<span style="color: #ff0000;"><strong>Feb 13 11:58:18 systemd-udevd[59619]: Process '/alt-root/sbin/fxload -t fx2 -I /alt-root/etc/hotplug/usb/xusbdfwu.fw/xusb_emb.hex -D ' failed with exit code 255.
</strong></span></pre>
<p>immediately followed by:</p>
<pre>Feb 13 11:58:25 kernel: usb 1-5.1.1: new high-speed USB device number 80 using xhci_hcd
Feb 13 11:58:25 kernel: usb 1-5.1.1: New USB device found, idVendor=03fd, idProduct=0008
Feb 13 11:58:25 kernel: usb 1-5.1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 13 11:58:25 kernel: usb 1-5.1.1: Product: XILINX
Feb 13 11:58:25 kernel: usb 1-5.1.1: Manufacturer: XILINX</pre>
<p>This log contains contradicting messages. On one hand, the device is clearly re-enumerated with a new product ID, indicating that the firmware load went fine. On the other hand, there&#8217;s an error message saying fxload failed.</p>
<p>I messed around quite a bit with udev because of this. The problem is that the argument to the -D flag should be the path to the device files of the USB device, and there&#8217;s nothing there. In the related udev rule, it says $devnode, which should substitute to exactly that. Why doesn&#8217;t it work?</p>
<p>The answer is that it actually does work. For some unclear reason, the relevant udev rule is called a second time, and on that second time $devnode is substituted with nothing. Which is harmless because it fails royally with no device file to poke. Except for that confusing error message.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2021/02/xilinx-impact-linux/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>systemd: Shut down computer at a certain uptime</title>
		<link>http://billauer.co.il/blog/2021/01/systemd-money-saver/</link>
		<comments>http://billauer.co.il/blog/2021/01/systemd-money-saver/#comments</comments>
		<pubDate>Sun, 31 Jan 2021 17:32:17 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[systemd]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6226</guid>
		<description><![CDATA[Motivation Paid-per-time cloud services. I don&#8217;t want to forget one of those running, just to get a fat bill at the end of the month. And if the intended use is short sessions anyhow, make sure that the machine shuts down by itself after a given amount of time. Just make sure that a shutdown [...]]]></description>
			<content:encoded><![CDATA[<h3>Motivation</h3>
<p>Paid-per-time cloud services. I don&#8217;t want to forget one of those running, just to get a fat bill at the end of the month. And if the intended use is short sessions anyhow, make sure that the machine shuts down by itself after a given amount of time. Just make sure that a shutdown by the machine itself accounts for cutting the costs. And sane cloud provider does that except for, possibly, costs for storing the VM&#8217;s disk image.</p>
<p>So this is the cloud computing parallel to &#8220;did I lock the door?&#8221;.</p>
<p>The examples here are based upon systemd 241 on Debian GNU/Linux 10.</p>
<h3>The main service</h3>
<p>There is more than one way to do this. I went for two services: One that calls /sbin/shutdown with a five minute delay (so I get a chance to cancel it) and then second is a timer for the uptime limit.</p>
<p>So the main service is this file as /etc/systemd/system/uptime-limiter.service:</p>
<pre>[Unit]
Description=Limit uptime service

[Service]
ExecStart=/sbin/shutdown -h +5 "System it taken down by uptime-limit.service"
Type=simple

[Install]
WantedBy=multi-user.target</pre>
<p>The naïve approach is to just enable the service and expect it to work. Well, it does work when started manually, but when this service starts as part of the system bringup, the shutdown request is registered but later ignored. Most likely because systemd somehow cancels pending shutdown requests when it reaches the ultimate target.</p>
<p>I should mention that adding After=multi-user.target in the unit file didn&#8217;t help. Maybe some other target. Don&#8217;t know.</p>
<h3>The timer service</h3>
<p>So the way to ensure that the shutdown command is respected is to trigger it off with a timer service.</p>
<p>The timer service as /etc/systemd/system/uptime-limiter.timer, in this case allows for 6 hours of uptime (plus the extra 5 minutes given by the main service):</p>
<pre>[Unit]
Description=Timer for Limit uptime service

[Timer]
OnBootSec=6h
AccuracySec=1s

[Install]
WantedBy=timers.target</pre>
<p>and enable it:</p>
<pre># <strong>systemctl enable uptime-limiter<span style="color: #ff0000;">.timer</span></strong>
Created symlink /etc/systemd/system/timers.target.wants/uptime-limiter.timer → /etc/systemd/system/uptime-limiter.timer.</pre>
<p>Note two things here: That I enabled the timer, not the service itself, by adding the .timer suffix. And I didn&#8217;t start it. For that, there&#8217;s the &#8211;now flag.</p>
<p>So there are two steps: When the timer fires off, the call to /sbin/shutdown takes place, and that causes nagging wall messages to start once a minute, and eventually a shutdown. Mission complete.</p>
<h3>What timers are pending</h3>
<p>Ah, that&#8217;s surprisingly easy:</p>
<pre># <strong>systemctl list-timers</strong>
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sun 2021-01-31 17:38:28 UTC  14min left    n/a                          n/a          systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2021-01-31 20:50:22 UTC  3h 26min left Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily.timer              apt-daily.service
<span style="color: #ff0000;"><strong>Sun 2021-01-31 23:23:28 UTC  5h 59min left n/a                          n/a          uptime-limiter.timer         uptime-limiter.service
</strong></span>Sun 2021-01-31 23:23:34 UTC  5h 59min left Sun 2021-01-31 17:23:34 UTC  44s ago      google-oslogin-cache.timer   google-oslogin-cache.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago logrotate.timer              logrotate.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago man-db.timer                 man-db.service
Mon 2021-02-01 06:49:19 UTC  13h left      Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily-upgrade.timer      apt-daily-upgrade.service</pre>
<p>Clean and simple. And this is probably why this method is better than a long delay on shutdown, which is less clear about what it&#8217;s about to do, as shown next.</p>
<p>Note that a timer service can be stopped, which is parallel to canceling a shutdown. Restarting it to push the time limit further won&#8217;t work in this case, because the service is written related to OnBootSec.</p>
<h3>Is there a shutdown pending?</h3>
<p>To check if a shutdown is about to happen:</p>
<pre>$ <strong>cat /run/systemd/shutdown/scheduled</strong>
USEC=<span style="color: #ff0000;"><strong>1612103418427661</strong></span>
WARN_WALL=1
MODE=poweroff
WALL_MESSAGE=System it taken down by uptime-limit.service</pre>
<p>There are different reports on what happens when the shutdown is canceled. On my system, the file was deleted in response to &#8220;shutdown -c&#8221;, but not when the shutdown was canceled because the system had just booted up. There&#8217;s <a href="https://unix.stackexchange.com/questions/229745/systemd-how-to-check-scheduled-time-of-a-delayed-shutdown" target="_blank">other suggested</a> ways too, but in the end, it appears like there&#8217;s no definite way to tell if a system has a shutdown scheduled or not. At least not as of systemd 241.</p>
<p>That USEC line is the epoch time for when shutdown will take place. A Perl guy like me goes</p>
<pre>$ perl -e 'print scalar gmtime(<span style="color: #ff0000;"><strong>1612103418427661</strong></span>/1e6)'</pre>
<p>but that&#8217;s me.</p>
<h3>What didn&#8217;t work</h3>
<p>So this shows what <strong>doesn&#8217;t</strong> work: Enable the main service (as well as start it right away with the &#8211;now flag):</p>
<p><span style="text-decoration: line-through;"> </span></p>
<pre># <strong>systemctl enable --now uptime-limiter</strong>
Created symlink /etc/systemd/system/multi-user.target.wants/uptime-limiter.service → /etc/systemd/system/uptime-limiter.service.

Broadcast message from root@instance-1 (Sun 2021-01-31 14:15:19 UTC):

System it taken down by uptime-limit.service
The system is going down for poweroff at Sun 2021-01-31 14:25:19 UTC!</pre>
<p>So the broadcast message is out there right away. But this is misleading: It won&#8217;t work at all when the service is started automatically during system boot.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2021/01/systemd-money-saver/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Systemd services as cronjobs: No process runs away</title>
		<link>http://billauer.co.il/blog/2021/01/systemd-cron-cgroups/</link>
		<comments>http://billauer.co.il/blog/2021/01/systemd-cron-cgroups/#comments</comments>
		<pubDate>Mon, 18 Jan 2021 06:10:06 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[Server admin]]></category>
		<category><![CDATA[systemd]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6214</guid>
		<description><![CDATA[But why? Cronjobs typically consists of a single utility which we&#8217;re pretty confident about. Even if it takes quite some time to complete (updatedb, for example), there&#8217;s always a simple story, a single task to complete with a known beginning and end. If the task involves a shell script that calls a few utilities, that [...]]]></description>
			<content:encoded><![CDATA[<h3>But why?</h3>
<p>Cronjobs typically consists of a single utility which we&#8217;re pretty confident about. Even if it takes quite some time to complete (updatedb, for example), there&#8217;s always a simple story, a single task to complete with a known beginning and end.</p>
<p>If the task involves a shell script that calls a few utilities, that feeling of control fades. It&#8217;s therefore reassuring to know that everything can be cleaned up neatly by simple stopping a service. Systemd is good at that, since all processes that are involved in the service are kept in a separate cgroup. So when the service is stopped, all processes that were possibly generated eventually get a SIGKILL, typically 90 seconds after the request to stop the service, unless they terminated voluntarily in response to the initial SIGTERM.</p>
<p>Advantage number two is that the systemd allows for a series of capabilities to limit what the cronjob is capable of doing, thanks to the cgroup arrangement. This doesn&#8217;t fall very short from the possibilities of container virtualization, with pretty simple assignments in the unit file. This includes making certain directories inaccessible or accessible for read-only, setting up temporary directories, disallow external network connection, limit the set of allowed syscalls, and of course limit the amount of resources that are consumed by the service. They&#8217;re called Control Groups for a reason.</p>
<p>There&#8217;s also the RuntimeMaxSec parameter in the service unit file, which is the maximal wall clock time the service is allowed to run. The service is terminated and put in failure state if this time   is exceeded. This is however supported from systemd version 229 and later, so check with  &#8220;systemctl &#8211;version&#8221;.</p>
<p>My original idea was to use systemd timers to kick off the job, and let RuntimeMaxSec make sure it would get cleaned up if it ran too long (i.e. got stuck somehow). But because the server in question ran a rather old version of systemd, I went for a cron entry for starting the service and another one for stopping it, with a certain time difference between them. In hindsight, cron turned to be neater for kicking off the jobs, because I had multiple variants of them in different times. So one single file enclosed all.</p>
<p>The main practical difference is that if a service reaches RuntimeMaxSec, it&#8217;s terminated with a failed status. The cron solution stops the service without this. I guess there&#8217;s a systemctl way to achieve the failed status, if that&#8217;s really important.</p>
<p>As a side note, I have a <a href="http://billauer.co.il/blog/2020/06/firejail-cgroups/" target="_blank">separate post</a> on Firejail, which is yet another possibility to use cgroups for controlling what processes do.</p>
<h3>Timer basics</h3>
<p>The idea is simple: A service can be started as a result of a timer event. That&#8217;s all that timer units do.</p>
<p>Timer units are configured like any systemd units (man systemd.unit)   but have a .timer suffix and a dedicated [Timer] section. By  convention,  the timer unit named foo.timer activates the service  foo.service,  unless specified differently with the Unit= assignment  (useful for  generating confusion).</p>
<p>Units that are already running when the timer event occurs are not restarted, but are left to keep running. Exactly like systemctl start would do.</p>
<p>For an cronjob-style timer, use OnCalendar= to specify the times. See  man systemd.time for the format. Note that AccuracySec= should be set  too to control how much systemd can play with the exact time of  execution, or systemd&#8217;s behavior might be confusing.</p>
<p>To see all active timers, go</p>
<pre>$ systemctl list-timers</pre>
<h3>The unit file</h3>
<p>As usual, the unit file (e.g. /etc/systemd/system/cronjob-test@.service) is short and concise:</p>
<pre>[Unit]
Description=Cronjob test service

[Service]
ExecStart=/home/eli/shellout/utils/shellout.pl "%I"
Type=simple
User=eli
WorkingDirectory=/home/eli/shellout/utils
KillMode=mixed
<span style="text-decoration: line-through;">NoNewPrivileges=true</span></pre>
<p>This is a simple service, meaning that systemd expects the process launched by ExecStart to run in the foreground.</p>
<p>Note however that the service unit&#8217;s file name has a &#8220;@&#8221; character and that %I is used to choose what to run, based upon the unescaped instance name (see main systemd.unit). This turns the unit file into a template, and allows choosing an arbitrary command (the shellout.pl script is explained below) with something like (really, this works)</p>
<pre># systemctl start cronjob-test@'echo "Hello, world"'</pre>
<p>This might seems dangerous, but recall that root privileges are required  to start the service, and you get a plain-user process (possibly with no ability  to escalate privileges) in return. Not the big jackpot.</p>
<p>For stopping the service, exactly the same service specifier string is required. But it&#8217;s also possible to stop all instances of a service with</p>
<pre># systemctl stop 'cronjob-test@*'</pre>
<p>How neat is that?</p>
<p>A few comments on this:</p>
<ul>
<li>The service should not be systemd-wise enabled (i.e. no &#8220;systemctl enable&#8221;) &#8212; that&#8217;s what you do to get it started on boot or following some kind of event. This is not the case, as the whole point is to start the service directly by a timer or crond.</li>
<li>Accordingly, the service unit file does <strong>not</strong> have an [Install] section.</li>
<li>A side effect of this is that the service may not appear in the list made by &#8220;systemctl&#8221; (without any arguments) unless it has processes running on its behalf currently running (or possibly if it&#8217;s in the failed state). Simple logic: It&#8217;s not loaded unless it has a cgroup allocated, and the cgroup is removed along with the last process. But it may appear anyhow under some conditions.</li>
<li>ExecStart must have a full path (i.e. not relative) even if the WorkingDirectory is set. In particular, it can&#8217;t be ./something.</li>
<li>A &#8220;systemctl start&#8221; on a service that is marked as failed will be started anyhow (i.e. the fact that it&#8217;s marked failed doesn&#8217;t prevent that). Quite obvious, but I tested it to be sure.</li>
<li>Also, a &#8220;systemctl start&#8221; causes the execution of ExecStart if and only if there&#8217;s no cgroup for it, which is equivalent to not having a process running on its behalf</li>
<li>KillMode is set to &#8220;mixed&#8221; which sends a SIGTERM only to the process that is launched directly when the service is stopped. The SIGKILL 90 seconds later, if any, is sent to all processes however. The default is to give all processes in the cgroup the SIGTERM when stopping.</li>
<li>NoNewPrivileges is a little paranoid thing: When no process has any reason to change its privileges or user IDs,  block this possibility. This mitigates damage, should the job be successfully attacked in some way. But I ended up not using it, as running sendmail fails (it has some setuid thing to allow access to the mail spooler).</li>
</ul>
<h3>Stopping</h3>
<p>There is no log entry for a service of simple type that terminates with a success status. Even though it&#8217;s stopped in the sense that it has no allocated cgroup and &#8220;systemctl start&#8221; behaves as if it was stopped, a successful termination is silent. Not sure if I like this, but that&#8217;s the way it is.</p>
<p>When the process doesn&#8217;t respond to SIGTERM:</p>
<pre>Jan 16 19:13:03 systemd[1]: <strong>Stopping</strong> Cronjob test service...
Jan 16 19:14:33 systemd[1]: cronjob-test.service stop-sigterm timed out. Killing.
Jan 16 19:14:33 systemd[1]: cronjob-test.service: main process exited, code=killed, status=9/KILL
Jan 16 19:14:33 systemd[1]: <strong>Stopped</strong> Cronjob test service.
Jan 16 19:14:33 systemd[1]: Unit cronjob-test.service entered failed state.</pre>
<p>So there&#8217;s always &#8220;Stopping&#8221; first and then &#8220;Stopped&#8221;. And if there are processes in the control group 90 seconds after &#8220;Stopping&#8221;, SIGKILL is sent, and the service gets a &#8220;failed&#8221; status. Not being able to quit properly is a failure.</p>
<p>A &#8220;systemctl stop&#8221; on a service that is already stopped is legit: The systemctl utility returns silently with a success status, and a &#8220;Stopped&#8221; message appears in the log without anything actually taking place. Neither does the service&#8217;s status change, so if it was considered failed before, so it remains. And if the target to stop was a group if instances (e.g. systemctl stop &#8216;cronjob-test@*&#8217;) and there were no instances to stop, there&#8217;s even not a log message on that.</p>
<p>Same logic with &#8220;Starting&#8221; and &#8220;Started&#8221;: A superfluous &#8220;systemctl start&#8221; does nothing except for a &#8220;Started&#8221; log message, and the utility is silent, returning success.</p>
<h3>Capturing the output</h3>
<p>By default, the output (stdout and  stderr) of the processes is logged in the journal. This is usually  pretty convenient, however I wanted the good old cronjob behavior: An  email is sent unless the job is completely silent and exits with a success  status (actually, crond doesn&#8217;t care, but I wanted this too).</p>
<p>This  concept doesn&#8217;t fit systemd&#8217;s spirit: You don&#8217;t start sending mails  each time a service has something to say. One could use OnFailure for  activating another service that calls home when the service gets into a  failure status (which includes a non-success termination of the main  process), but that mail won&#8217;t tell me the output. To achieve this, I wrote a  Perl script. So there&#8217;s one extra process, but who cares, systemd  kills&#8217;em all in the end anyhow.</p>
<p>Here it comes (I called it shellout.pl):</p>
<pre>#!/usr/bin/perl

use strict;
use warnings;

# Parameters for sending mail to report errors
my $sender = 'eli';
my $recipient = 'eli';
my $sendmail = "/usr/sbin/sendmail -i -f$sender";

my $cmd = shift;
my $start = time();

my $output = '';

my $catcher = sub { finish("Received signal."); };

$SIG{HUP} = $catcher;
$SIG{TERM} = $catcher;
$SIG{INT} = $catcher;
$SIG{QUIT} = $catcher;

my $pid = open (my $fh, '-|');

finish("Failed to fork: $!")
  unless (defined $pid);

if (!$pid) { # Child process
  # Redirect stderr to stdout for child processes as well
  open (STDERR, "&gt;&amp;STDOUT");

  exec($cmd) or die("Failed to exec $cmd: $!\n");
}

# Parent
while (defined (my $l = &lt;$fh&gt;)) {
  $output .= $l;
}

close $fh
 or finish("Error: $! $?");

finish("Execution successful, but output was generated.")
 if (length $output);

exit 0; # Happy end
sub finish {
  my ($msg) = @_;

  my $elapsed = time() - $start;

  $msg .= "\n\nOutput generated:\n\n$output\n"
    if (length $output);

  open (my $fh, '|-', "$sendmail $recipient") or
    finish("Failed to run sendmail: $!");

  print $fh &lt;&lt;"END";
From: Shellout script &lt;$sender&gt;
Subject: systemd cron job issue
To: $recipient

The script with command \"$cmd\" ran $elapsed seconds.

$msg
END

  close $fh
    or die("Failed to send email: $! $?\n");

  $SIG{TERM} = sub { }; # Not sure this matters
  kill -15, $$; # Kill entire process group

  exit(1);
}</pre>
<p>First, let&#8217;s pay attention to</p>
<pre>open (STDERR, "&gt;&amp;STDOUT");</pre>
<p>which makes sure standard error is redirected to standard output. This is inherited by child processes, which is exactly the point.</p>
<p>The script catches the signals (SIGTERM in particular, which is systemd&#8217;s first hint that it&#8217;s time to pack and leave) and sends a SIGTERM to all other processes in turn. This is combined with KillMode being set to &#8220;mixed&#8221; in the service unit file, so that only shellout.pl gets the signal, and not the other processes.</p>
<p>The rationale is that if all processes get the signal at once, it may (theoretically?) turn out that the child process terminates before the script reacted to the signal it got itself, so it will fail to report that the reason for the termination was a signal, as opposed to the termination of the child. This could miss a situation where the child process got stuck and said nothing when being killed.</p>
<p>Note that the script kills all processes in the process group just before quitting due to a signal it got, or when the invoked process terminates and there was output. Before doing so, it sets the signal handler to a NOP, to avoid an endless loop, since the script&#8217;s process will get it as well (?). This NOP thing appears to be unnecessary, but better safe than sorry.</p>
<p>Also note that the while loop quits when there&#8217;s nothing more in &lt;$fh&gt;. This means that if the child process forks and then terminates, the while loop will continue, because unless the forked process closed its output file handles, it will keep the reference count of the script&#8217;s stdin above zero. The first child process will remain as a zombie until the forked process is done. Only then will it be reaped by virtue of the close $fh. This machinery is not intended for fork() sorcery.</p>
<p>I took a different approach in <a href="http://billauer.co.il/blog/2020/10/perl-fork-ipc-kill-children/" target="_blank">another post of mine</a>, where the idea was to fork explicitly and modify the child&#8217;s attributes. <a href="http://billauer.co.il/blog/2013/03/fork-wait-and-the-return-values-in-perl-in-different-scenarios/" target="_blank">Another post</a> discusses timing out a child process in general.</p>
<h3>Summary</h3>
<p>Yes, cronjobs are much simpler. But in the long run, it&#8217;s a good idea to acquire the ability to run cronjobs as services for the sake of keeping the system clean from runaway processes.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2021/01/systemd-cron-cgroups/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Decoding email&#8217;s quoted-printable with Perl</title>
		<link>http://billauer.co.il/blog/2020/12/extract-quoted-printable-html-perl/</link>
		<comments>http://billauer.co.il/blog/2020/12/extract-quoted-printable-html-perl/#comments</comments>
		<pubDate>Sun, 20 Dec 2020 12:26:56 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[email]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6199</guid>
		<description><![CDATA[To make it short, the command at shell prompt is $ perl -MMIME::QuotedPrint -e 'local $/; $x=&#60;&#62;; print decode_qp($x)' &#60; quoted.txt &#62; unquoted.html and I needed this to extract an HTML segment of an email.]]></description>
			<content:encoded><![CDATA[<p>To make it short, the command at shell prompt is</p>
<pre>$ perl -MMIME::QuotedPrint -e 'local $/; $x=&lt;&gt;; print decode_qp($x)' &lt; quoted.txt &gt; unquoted.html</pre>
<p>and I needed this to extract an HTML segment of an email.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2020/12/extract-quoted-printable-html-perl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a tarball for distribution (without user/group information)</title>
		<link>http://billauer.co.il/blog/2020/11/tar-create-owner-group/</link>
		<comments>http://billauer.co.il/blog/2020/11/tar-create-owner-group/#comments</comments>
		<pubDate>Mon, 16 Nov 2020 10:09:01 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6176</guid>
		<description><![CDATA[A tarball is the common way to convey several files on UNIX systems. But because tar was originally intended for backup, it stores not only the permission information, but also the owner and group of each file. Try listing the content of a tarball with e.g. $ tar -tzvf thestuff.tar.gz Note the &#8220;v&#8221; flag that [...]]]></description>
			<content:encoded><![CDATA[<p>A tarball is the common way to convey several files on UNIX systems. But because tar was originally intended for backup, it stores not only the permission information, but also the owner and group of each file. Try listing the content of a tarball with e.g.</p>
<pre>$ tar -tzvf thestuff.tar.gz</pre>
<p>Note the &#8220;v&#8221; flag that goes along with the flag for listing, &#8220;t&#8221;: It causes tar to print out ownership and permission information.</p>
<p>This doesn&#8217;t matter much if the tarball is extracted as a non-root user on the other end, because tar doesn&#8217;t set the user and group ID in that case: The extracted files get the uid/gid of the process that extracted them.</p>
<p>However if user at the other end extract the tarball as root, the original uid/gid is assigned, which may turn out confusing.</p>
<p>To avoid this, tell tar to assign user root to all files in the archive. This makes no difference if the archive is extracted by a non-root user, but sets the ownership to root if extracted by root. In fact, it sets the ownership to the extracting user in both cases, which is what one would expect.</p>
<p>So this is the command to use to create an old-school .tar.gz tarball:</p>
<pre>$ tar --owner=0 --group=0 -czf thestuff.tar.gz thestuff</pre>
<p>Note that you <strong>don&#8217;t</strong> have to be root to do this. You&#8217;re just creating a plain file with your own ownership. It&#8217;s extracting these file as root that requires root permissions (if so desired).</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2020/11/tar-create-owner-group/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linux kernel: Dumping a module&#8217;s content for regression check</title>
		<link>http://billauer.co.il/blog/2020/10/linux-kernel-module-before-after/</link>
		<comments>http://billauer.co.il/blog/2020/10/linux-kernel-module-before-after/#comments</comments>
		<pubDate>Thu, 29 Oct 2020 08:40:16 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Linux kernel]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6166</guid>
		<description><![CDATA[After making a lot of whitespace reorganization in a kernel module (indentation, line breaks, fixing things reported by sparse and checkpatch), I wanted to make sure I didn&#8217;t really change anything. All edits were of the type that the compiler should be indifferent about, but how can I be sure I didn&#8217;t change anything accidentally? [...]]]></description>
			<content:encoded><![CDATA[<p>After making a lot of whitespace reorganization in a kernel module (indentation, line breaks, fixing things reported by sparse and checkpatch), I wanted to make sure I didn&#8217;t really change anything. All edits were of the type that the compiler should be indifferent about, but how can I be sure I didn&#8217;t change anything accidentally?</p>
<p>It would have been nice if the compiler&#8217;s object files were identical before and after the changes, but that doesn&#8217;t happen. So instead, let&#8217;s hope it&#8217;s enough to verify that the executable assembly code didn&#8217;t change, and neither did the string literals.</p>
<p>The idea is to make a disassembly of the executable part and dump the part that contains the literal strings, and output everything into a single file. Do that before and after the changes (git helps here, of course), and run a plain diff on the couple of files.</p>
<p>Which boils down to this little script:</p>
<pre>#!/bin/bash

objdump -d $1
objdump -s -j .rodata -j .rodata.str1.1 $1</pre>
<p>and run it on the compiled module, e.g.</p>
<pre>$ ./regress.sh themodule.ko &gt; original.txt</pre>
<p>The script first makes the disassembly, and then makes a hex dump of two sections in the ELF file. Most interesting is the .rodata.str1.1 section, which contains the string literals. That&#8217;s the name of this section on an v5.7 kernel, anyhow.</p>
<p>Does it cover everything? Can I be sure that I did nothing wrong if the outputs before and after the changes are identical? I don&#8217;t really know. I know for sure that it detects the smallest change in the code, as well as a change in any error message string I had (and that&#8217;s where I made a lot of changes), but maybe there are some accidents that this check doesn&#8217;t cover.</p>
<p>As for how I found the names of the sections: Pretty much trying them all. The list of sections in the ELF file can be found with</p>
<pre>$ readelf -S themodule.ko</pre>
<p>However only those marked with PROGBITS type can be dumped with objdump -s (or more precisely, will be found with the -j flag). I think. It&#8217;s not like I really understand what I&#8217;m doing here.</p>
<p>Bottom line: This check is definitely better than nothing.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2020/10/linux-kernel-module-before-after/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perl + Linux: Properly cleaning up a forking script after it exits</title>
		<link>http://billauer.co.il/blog/2020/10/perl-fork-ipc-kill-children/</link>
		<comments>http://billauer.co.il/blog/2020/10/perl-fork-ipc-kill-children/#comments</comments>
		<pubDate>Sat, 10 Oct 2020 16:24:44 +0000</pubDate>
		<dc:creator>eli</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://billauer.co.il/blog/?p=6155</guid>
		<description><![CDATA[Leave no leftover childred One of the really tricky things about a Perl script that forks this way or another, is how to make sure that the children vanish after the parent has exited. This is an issue both if the children were created with a fork() call, or with a safe pipe, as with [...]]]></description>
			<content:encoded><![CDATA[<h3>Leave no leftover childred</h3>
<p>One of the really tricky things about a Perl script that forks this way or another, is how to make sure that the children vanish after the parent has exited. This is an issue both if the children were created with a fork() call, or with a safe pipe, as with</p>
<pre>my $pid = open(my $fd, '-|');</pre>
<p>It may seem to work fine when the main script is terminated with a CTRL-C. The children will indeed vanish. But try killing the main script with a &#8220;kill&#8221; command, and the parent dies, but the children remain alive and kicking.</p>
<p>The Linux-only solution is</p>
<pre>use Linux::Prctl</pre>
<p>and then, in the part of the script that runs as a child, do</p>
<pre>Linux::Prctl::set_pdeathsig(9);</pre>
<p>immediately after the branch between parent and child. This tells Linux to send a SIGKILL to the process that made this call (i.e. the child) as soon as the parent exits. One might be more gentle with a SIGTERM (number 15). But the idea is the same. Parent is away, get the hammer.</p>
<p>To get the Perl module:</p>
<pre># apt install liblinux-prctl-perl</pre>
<p>And BTW, SIGPIPE doesn&#8217;t help here, even if there&#8217;s a pipe between the two processes: It&#8217;s delivered only when the child processes attempts to write to a pipe that is closed on the other end. If it doesn&#8217;t, the broken pipe is never sensed. And if it&#8217;s on the reading side, there&#8217;s no SIGPIPE at all &#8212; the pipe just gives an EOF when the data is exhausted.</p>
<p>The pdeathsig can of course be used in non-Perl programs as well. This is the Perl example.</p>
<h3>Multiple safe pipes</h3>
<p>When a process generates multiple children, there&#8217;s a problem with the fact that the children inherit the already existing opened file descriptors. For example, when the main script creates multiple children by virtue of safe pipes for read (calling open(my $fd, &#8216;-|&#8217;) repeatedly, so the children write and parent reads): Looking at /proc/<em>PID</em>/fd  of the children, it&#8217;s clear that they have a lot of pipes opened that they have nothing to do with.</p>
<p>This prevents the main script (the parent), as well some of the children from terminating, even after either side calls to exit() or die(). These processes don&#8217;t turn into zombies, but remain plain unterminated processes in the stopped state. At least so it turned out on my Perl v5.26.1 on an x86_64 Linux machine.</p>
<p>The problem for this case occurs when pipes have pending data when the main script attempted to terminate, for example by virtue of a print to STDOUT (which is redirected to the pipe going to the parent). This is problematic, because the child process will attempt to write the remaining data just before quitting (STDOUT is flushed). The process will block forever on this write() call. Since the child doesn&#8217;t terminate, the parent process blocks on wait(), and doesn&#8217;t terminate either. It&#8217;s a deadlock. Even if close() isn&#8217;t called explicitly in the main script, the automatic file descriptor close before termination will behave exactly the same: It waits for the child process.</p>
<p>What usually happens in this situation is that when the parent closes the file descriptor, it sends a SIGPIPE to the child. The blocking write() returns as a result with an EPIPE status (Broken pipe), and the child process terminates. This allows the parent&#8217;s wait() to reap the child, and the parent process can continue.</p>
<p>And here&#8217;s the twist: If the file descriptor belongs to several processes after forking, SIGPIPE is sent to the child only when the last file descriptor is closed. As a result, when the parent process attempts to close one of its pipes, SIGPIPE isn&#8217;t sent if the children hasn&#8217;t closed their copies of the same pipe file descriptor. The deadlock described above occurs.</p>
<p>There can be worked around by making sure to close the pipes so that the child processes are reaped in the order reversed to their creation. But it&#8217;s much simpler to just close the unnecessary file descriptors on the children side.</p>
<p>So the solution is to go</p>
<pre>foreach my $fd (@safe_pipe_fds) {
  close($fd)
   <strong>and</strong> print STDERR "What? Closing unnecessary file descriptor was successful!\n";
}</pre>
<p>on the child&#8217;s side, immediately after the call to set_pdeathsig(), as mentioned above.</p>
<p>All of these close() calls <strong>should fail</strong> with an ECHILD (No child processes) status: The close() call attempts to waitpid() for the main script&#8217;s children (closing a pipe waits for the process on the other side to terminate), which fails because only the true parent can do that. Regardless, the file descriptors are indeed closed, and each child process holds only the file descriptors it needs to. And most importantly, there&#8217;s no problem terminating.</p>
<p>So the error message is given when the close is successful. The &#8220;and&#8221; part isn&#8217;t a mistake.</p>
<p>It&#8217;s also worth mentioning, that exactly the same close() (with a failed wait() call) occurs anyhow when the child process terminates (I&#8217;ve checked it with strace). The code snippet above just makes it earlier, and solves the deadlock problem.</p>
<p>Either way, it&#8217;s probably wiser to use pipe() and fork() except for really simple one-on-one IPC between a script and itself, so that all this file descriptor and child reaping is done on the table.</p>
<p>As for pipes to and from other executables with open(), that&#8217;s not a problem. I mean calls such as open(IN, &#8220;ps aux|&#8221;) etc. That&#8217;s because Perl automatically closes all file descriptors except STDIN, STDOUT and STDERR when calling execve(), which is the syscall for executing another program.</p>
<p>Or more precisely, it sets the FD_CLOEXEC flag for all files opened with a file number above $^F (a.k.a $SYSTEM_FD_MAX), which defaults to 2. So it&#8217;s actually Linux that automatically closes the files on a call to execve(). The possible problem mentioned above with SIGPIPE is hence solved this way. Note that this is something Perl does for us, so if you&#8217;re writing a program in C and plan to call execve() after a fork &#8212; by all means close all file descriptors that aren&#8217;t needed before doing that.</p>
]]></content:encoded>
			<wfw:commentRss>http://billauer.co.il/blog/2020/10/perl-fork-ipc-kill-children/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
