When dovecot silently stops to deliver mails

This post was written by eli on July 23, 2021
Posted Under: email,Linux,Server admin

After a few days being happy with not getting spam, I started to suspect that something is completely wrong with receiving mail. As I’m using fetchmail to get mail from my own server running dovecot v2.2.13, I’m used to getting notifications when fetchmail is unhappy. But there was no such.

Checking up the server’s logs, there were tons of these messages:

dovecot: master: Warning: service(pop3-login): process_limit (100) reached, client connections are being dropped

Restarting dovecot got it back running properly again, and I got a flood of the mails that were pending on the server. This was exceptionally nasty, because mails stopped arriving silently.

So what was the problem? The clue is in these log messages, which occurred about a minute after the system’s boot (it’s a VPS virtual machine):

Jul 13 11:21:46 dovecot: master: Error: service(anvil): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(log): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(ssl-params): Initial status notification not received in 30 seconds, killing the process
Jul 13 11:21:46 dovecot: master: Error: service(log): child 1210 killed with signal 9

These three services are helper processes for dovecot, as can be seen in the output of systemctl status:

            ├─dovecot.service
             │ ├─11690 /usr/sbin/dovecot -F
             │ ├─11693 dovecot/anvil
             │ ├─11694 dovecot/log
             │ ├─26494 dovecot/config
             │ ├─26495 dovecot/auth
             │ └─26530 dovecot/auth -w

What seems to have happened is that these processes failed to launch properly within the 30 second timeout limit, and were therefore killed by dovecot. And then attempts to make pop3 connections seem to have got stuck, with the forked processes that are made for each connection remaining. Eventually, they reached the maximum of 100.

The reason this happened only now is probably that the hosting server had some technical failure and was brought down for maintenance. When it went up again, all VMs were booted at the same time, so they were all very slow in the beginning. Hence it took exceptionally long to kick off those helper processes. The 30 seconds timeout kicked in.

The solution? Restart dovecot once in 24 hours with a plain cronjob. Ugly, but works. In the worst case, mail will be delayed for 24 hours. This is a very rare event to begin with.

Add a Comment

required, use real name
required, will not be published
optional, your blog address