As I implemented a MiG controller for KC705′s on-board SODIMM, the controller failed to calibrate at first. Despite that I’ve copied the instantiation and port connections from the example design. As for the pin placement, this was taken care of by the core itself, by virtue of memctl/memctl/user_design/constraints/memctl.xdc within MiG’s dedicated directory (which I called memctl).
And yet init_calib_complete remained low, indicating calibration had failed.
Actually, I had followed Xilinx’ XTP196 slides, except that I didn’t make an example design — I had my own.
Having ruled out holding the MiG controller in reset or a faulty pinout, it turned out that a constraint needs to be added to the application XDC file, namely
set_property slave_banks {32 34} [get_iobanks 33]
What it says (see UG912), is that I/O banks 32 and 34 should calibrate their on-chip terminations (DCI, Digitally Controller Impedance) based upon the reference resistors connected to the dedicated pins on bank 33. Without this constraints, on-chip termination on banks 32 and 34 doesn’t work, and the signal integrity on the relevant I/Os goes down the toilet. No wonder it didn’t calibrate.
The hint for this is on page 37 of XTP196, “Modifications to Example Design”, which tells us to overwrite the example design created by Vivado with a ZIP file Xilinx supplies. On the following page it lists the changes made, among others “Added DCI Cascade constraints to XDC”.
What’s this?
These are somewhat random jots I made while setting up an authoritative BIND server, so that a simple VPS machine can function standalone. Well, almost standalone, as it takes some help from a slave DNS to supply the second DNS entry. But even if that slave goes away suddenly, the show will go on. So practically speaking, it’s a one machine show.
Sources
Todo when replacing slave server (note to self)
Note that it all about changing IP addresses, as the slave server is referred to with my own “ns2″ subdomain.
- Update the glue record for the domain on which the name server is a subdomain.
- Update the A record for the name server in the relevant bind zone files (and bump serial number, right…?).
- Update allow-transfer and possibly also-notify in named.conf.local for all zones.
- Update the DNS monitoring script.
General notes
Recursive, non-recursive and AXFR
The confusing part about a name server is that it conveys information in three different ways (that I can think of), for different purposes:
- Recursive queries: It’s the cut-the-bullsh*t request for an IP of a domain name, issued by a e.g. a web browser. The server functions as the DNS of a (usually limited) net segment (defined by the allow-recursion option), and will ask around servers as necessary to reach the bottom line result of an IP address.
- Non-recursive queries: The server answers supplies only records that is written in its own zone files (and maybe also cached records? Not sure about that). This is the mode for an authoritative server, supplying the records for some specific domains it’s responsible for.
- AXFR: This is the “give me all you got” request from a slave of an authoritative server. This allows setting up the records on one machine, and have several other servers follow suit. Discussed below.
The DNS protocol allocates a bit on the query which tells if the request is recursive or not, and also a bit on the response, saying if was or not.
As “dig” makes recursive requests by default, authoritative servers (which are typically configured not to support recursive requests) will usually answer with a non-recursive response. Which will usually be exactly what we wanted in the first place (or why did we ask an authoritative server with “dig” in the first place?).
So for authoritative servers, the warning actually indicates proper behavior:
$ dig google.com @l.gtld-servers.net.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> google.com @l.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57926
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available
[ ... the name servers for google.com given here ... ]
Recursive queries were allowed to all by default in BIND until version 9.4.1, but later versions has it turned off by default, my own included. It makes sense, as most people who use BIND are likely to make a small authoritative server, and not a public DNS for everyone to enjoy.
Glue records or not
It’s very sleek to name your name servers with the domain it covers. It’s like ns1.example.com answering for example.com. As written in any guide to setting up a DNS, this makes a cyclic dependency: In order to find ns1.example.com, you first need to ask the delegated DNS, ns1.example.com, what IP it has. This is solved with glue records (the IPs are given explicitly in the “Additional Section” on the NS query).
In my specific case, I went for the cyclic dependency option for both name servers, mainly because the domain name of the chosen slave server didn’t resolve consistently to a single IP address. So it seemed safer to give the IP address myself with my own kind-of-bogus nameserver domain plus glue records which I set up on my registrar’s web interface. So it’s not just sleek, but it’s a good way to keep things stable. No reason to be afraid of this — it’s actually better.
As most people use their domains on hosted services, the name servers are given by the service provider, and hence their domain names have nothing to do with the hosted domain. Typically, the domain’s registrar offers a web interface for setting up the name servers, but only by their domain names.
Any serious registrar allows setting up glue records for cyclic dependencies explicitly. As a matter of fact, it won’t let you set a name server pointing at the same domain without any glue record given first. It can however be a bit confusing on how to do it on the web interface. For example, it might be under “Advanced Features” in the web management tool, called “Add Host Names”. It can also be called “personal name servers”.
Aside from Top Level Domain servers, glue record are rarely necessary, and are given in the “Additional” section as a neat shortcut. Actually, are they really “glue” when not absolutely necessary? Either way, as shortcuts, it seems like there are no rules for when they are present and when they aren’t. It’s like every server has its own rules.
Some name servers obtain glue addresses for other name servers by issuing lookups (or relying on their cache), and then present them in the “Additional” section. This is considered bad practice. A recent bind 9 won’t do this as an authoritative server, and the “fetch-glue yes/no” option is not available anymore.
This mess includes Top Level Domain servers as well, even though one could expect that they wouldn’t issue glue records unless they’re necessary. This is not to be confused with the responses to the exact same queries from common DNSes, which are usually more generous. Again, no fixed rules for this. For example,
$ dig NS walla.com @e.gtld-servers.net.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS walla.com @e.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30176
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;walla.com. IN NS
;; AUTHORITY SECTION:
walla.com. 172800 IN NS ns-550.awsdns-04.net.
walla.com. 172800 IN NS ns-277.awsdns-34.com.
walla.com. 172800 IN NS ns-1226.awsdns-25.org.
walla.com. 172800 IN NS ns-1976.awsdns-55.co.uk.
;; ADDITIONAL SECTION:
ns-277.awsdns-34.com. 172800 IN A 205.251.193.21
So why is there a glue record for ns-277.awsdns-34.com? It stands by itself, and has its own glue records. And why not the others? Because they’re not .com? Go figure.
At some early point I had some concerns on that the glue record relies on the given IP and not a DNS query made by either the registrar’s web app or the Top Level Domain name servers displaying a cached result. If that was the case, the IP address could be lost for some reason and the cyclic dependency would be impossible to resolve. After some playing around, I got pretty much convinces noone’s is even near to let that happen.
Zone file notes
- Always bump the serial number when making changes or the slave won’t catch them. The number must increase.
- Any domain not ending with a ‘.’ will be considered relative to the origin domain
- ‘@’ means the origin domain. It’s like a no-string for a domain.
- If no (sub)zone is given in the beginning line, it’s the previously line’s (“repeat name”).
- TXT records are limited to 2 kB, but the strings given in the zone file (within quotation marks) are limited to 256 bytes. To overcome this, it’s allowed to divide the strings into pieces, each with its own quotation marks, and a space between them. The actual text is the concatenation of the two strings in quotation marks, after removing these quotation marks. Think of it as a multi-line string in C language.
- The last number in the SOA record, often called TTL, is the negative caching TTL: How long a server is allowed to cache a “no record exists” answer.
- For each SPF record (of type TXT), add an identical record with type SPF. Or bind9 whines with
zone example.com/IN: 'example.com' found SPF/TXT record but no SPF/SPF record found, add matching type SPF record
even though SPF records died in 2014. But it won’t hurt (unless the slave doesn’t like it…?)
- Unfortunately, there is no built-in variable / macro expansion in bind. Several strings, and in particular the IP address are repeated over and over again.
To check a zone file:
$ named-checkzone example.com db.example.com
To generate a canonical file (see if it was understood correctly):
$ named-compilezone -o out.txt example.com db.example.com
Propagating the domain records to slaves
Explained on this chapter of DNS and BIND. The short story is that some other server (the “slave”) issues AXFR queries to the master regarding a domain, and in response it get all the records for that domain. The slave then responds to DNS queries for that domain based upon the information obtained. This takes place when the master DNS sends a NOTIFY message to the slave, and/or with refresh intervals. And there’s a thing with the serial number, which must be higher on the master than the slave, or the slave considers its local data updated. Actually, some slaves will issue an AXFR regardless.
These NOTIFY requests are there to tell the slaves an update is required: Assuming that the “notify” setting of the master DNS it “yes”, when its DNS’s records are updated, it sends a NOTIFY message to all authoritative servers by default (plus those explicitly given with “also-notify”). Those which are defined as slaves of the notifying server check the serial number in the SOA record. If it is different from what they have, they issue a transfer request to the master (with an AXFR command, or the incremental variant, IXFR), and update their data from the info that arrives.
The “Refresh” entry of the SOA record relates to the slave’s periodical polling of the master. This time period is important only with old versions of BIND (before BIND 8), as they didn’t support the NOTIFY command. With the newer versions, it sets the periodic polling, which should have no significance except for a little load on both sides.
The notification messages and their responses are logged, and should be verified when changes are made.
The slaves may, and probably will, send NOTIFY messages to the authoritative DNSes when they’re done updating, but odds are that these will be ignored, as the common setting is that all slaves take info from a single master.
A NOTAUTH response means that the request was sent to an non-authoritative server (so it doesn’t have the info). But it can also be an excuse for refusing a transfer (the correct answer is REFUSED for that case).
Typical session when restarting bind after making changes (and changing the serial number!):
Mar 23 17:02:54 named[11923]: zone billauer.co.il/IN: sending notifies (serial 2019032101)
Mar 23 17:02:54 named[11923]: zone example.com/IN: sending notifies (serial 2019032101)
Mar 23 17:02:54 named[11923]: zone example2.org/IN: sending notifies (serial 2019032101)
Mar 23 17:02:57 named[11923]: client 116.203.6.3#24089 (billauer.co.il): transfer of 'billauer.co.il/IN': AXFR started
Mar 23 17:02:57 named[11923]: client 116.203.6.3#24089 (billauer.co.il): transfer of 'billauer.co.il/IN': AXFR ended
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24092 (example2.org): transfer of 'example2.org/IN': AXFR started
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24092 (example2.org): transfer of 'example2.org/IN': AXFR ended
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24094 (example.com): transfer of 'example.com/IN': AXFR started
Mar 23 17:02:58 named[11923]: client 116.203.6.3#24094 (example.com): transfer of 'example.com/IN': AXFR ended
Finding a slave server
Having this all-in-one server, there’s only one things it can’t do by itself: Being a backup server. So you need to look for one. As DNS is a rather low-bandwidth service, the hosting price should be low to zero. I looked for one that provided this for free, mainly to save the hassle of annual payments. I’ve got enough of those.
It’s a backup after all, so if someone pulls the plug from the backup server all of the sudden, things will probably go on as usual for a while. So it’s in principle enough to trust the service provider that it won’t hijack your domain or something. And try picking one that will last, just to save the bother of setting the slave DNS up again.
As of March 2019, I found two alternatives for free slave services. There are probably many more.
BuddyNS is a company that was founded for supplying DNS services. It has lots of servers and a neat web interface. I got second thoughts when I realized that it’s a startup which hasn’t lifted off so well (again, March 2019), which makes it a potentially volatile choice.
So I went for Afraid FreeDNS (despite its not-so-encouraging name). They have quite a few options, but the free plan allows for a slave DNS mirror, and is called the “backup DNS” service. The web interface is simple, not so impressive, but very functional and to the point: A single page (login required to actually see something there), with a simple dashboard page saying what domains are being served, when they were last updated and when the last attempt took place. Plus a long log of events, including AXFRs that were either successful or failed, and if the latter why. And also AXFR requests that arrived from other servers to the slave and were rejected.
For any domain that needs slave coverage, the domain and its master DNS are fed into the web interface (a small “Add”) link. The master DNS must allow AXFRs from one IP address, 69.65.50.192. It takes a few minutes for the slave DNSes to update.
It’s also possible to allow AXFRs from other slaves, by setting up Slave AXFR-ALLOW ACL records. By default, AXFRs are rejected (as they should).
There’s only one DNS server for this slave service, ns2.afraid.org, with IP address 69.65.50.223 according to its authoritative server. It may sometimes resolve as 69.65.50.192 (note that this is the server that makes the AXFR requests). This double IP is harmless and by design, according to the DNS admin, who responded to my question on this matter.
I haven’t figured out the point of this as of yet, but the reason seems to be that the TLD server for .org gives the 69.65.50.192 address as a glue record (which isn’t really necessary) when asked about any .org domain that has ns2.afraid.org as a name server. So the DNS server asking about this caches this answer, and propagates it further. So the ISP-level DNS may sometimes answer 69.65.50.223 and sometimes 69.65.50.192, depending on its mood and cache.
The solution is simple, and it’s actually what I would suggest anyhow: Make it look like your own. Use a domain name of your own, with cyclic dependency, glue records and all that, and refer to the secondary name server by IP (using this domain, of course). This also makes it much easier to move to another server if necessary.
But why?
Fact number one: Running your own mail server is the most likely cause for messing up, and that can mean an intrusion to the server or just turning it into a public toilet for spam.
Nevertheless, if mail delivery is important to you, there’s probably no way around. And I’m not talking about the ability to mass-mail. Even having plain, manually written, messages delivered to that semi-security-paranoid company, even if it has a ZIP attachment, can be a challenge. And no matter what ISP you have or other paid-for mail relay, there will always be someone else pushing junk through the same channel, and make the used mail relay’s reputation questionable.
And I’m also under the impression that paid-for mail relays won’t send you a bounce message if the destination server refuses to talk with them, or if they dropped the message for the sake of their own reputation. Once I got my own server running, I suddenly got a few of these. I now realize how some emails I sent in the past just vanished.
Not to mention that the emails reach their destination much faster with a private workhorse.
The key issue is to take control of your reputation. As simple as that. Use all possible means (detailed below) to ensure the recipient that it was you who sent it, and let the lack of blacklisting do the rest.
But, ehm, after all this preaching, the real reason I set up my own mail server was that I had no choice: My web host, which also took care of outgoing mail from my website drove me crazy with upgrades out of nowhere. So I went for VPS hosting, and that requires your own mailing server. For better and worse. Or use the services of some mail forwarding service, which effectively means that all my mails look like spam.
Port 25 might be blocked by ISP
This isn’t directly related, but important enough: My ISP, Netvision, blocks connection to port 25 from my computer, probably to avoid blacklisting of their IP addresses due to spamming from them.
This means that testing port 25 from my local computer is worthless and misleading. I suppose other ISPs do the same.
Use external tools for testing port 25.
Selecting server software
Debian 8 arrived with Postfix by default. Exim is popular. I’m used to qmail and sendmail. Difficult to choose. Security is important. If the server gets compromised, my domain turns into a spamhouse at best. I also need some advanced features (DKIM in particular).
I went for sendmail 8.14.4. It has a bad word of mouth, but its security advisory record over the last ten years is better than Postfix and surely better than Exim. That’s a surprise, but you can’t argue with facts.
And I could go for qmail there, but it seems like it needs patching to support DKIM, and then who knows if I haven’t just made a hole.
Goals
The server should
- Open ports 25 and 587 for anyone to connect.
- Relay any email received on ports 25 and 587 from localhost only, without authentication
- Accept emails to local recipients ports 25 and 587 when connecting from foreign host. Port 25 is essential for inbound mail, but is sometimes blocked by firewalls, so open the other port as well.
- Add a DKIM signature to emails going to foreign hosts only
- Refuse to VRFY and EXPN
- Accept all emails (from external mailers as well) even if they don’t have rDNS entries etc (let the spam filter handle them)
Checklist
Note that a lot of these items are detailed further down this post. I also suggest taking a look on my post on SPF, DKIM and DMARC and possibly this, this and this as well.
- Be sure that your IP address is sustainable. The reputation which your mail server builds over the years is related to the IP address more than anything else. This also implies that the hosting service provider is stable, because if they go down, moving the server is maybe easy, but you lose the IP address.
- Verify that neither your IP nor your domain name have a bad reputation with a blacklist check.
- Make an rDNS record for the mail server’s IP. It should better begin a “mail” or “smtp” or “mx” subdomain. Make sure there’s only one rDNS record with a reverse DNS lookup (e.g. dig -x). Sounds silly, but happened to me.
- Set up the mail server properly and safely. Run a security check.
- Turn off DSN, so that the server won’t send return receipts. Discussed in detail in my other post, but the TLDR is to add noreceipts to confPRIVACY_FLAGS in /etc/mail/sendmail.mc.
- Set up the firewall to kill any IPv6 traffic (in particular reject, not drop, outgoing packets)
- Create SPF, DKIM and DMARC DNS record for the server. For SPF, with and without the “mx” subdomain.
- If the server has another name internally (other than the mx subdomain), make sure it has an A DNS record as well as an SPF one.
- Verify that the outgoing mail goes out right with this DKIM validator, which allows sending mail to it, and then see exactly how it arrived + results on the validation. Invaluable.
- Run a verbose manual mail submission and verify everything makes sense. In particular, make sure the HELO/EHLO domain matches the rDNS. However don’t expect the EHLO on the internal submission (from the program we’re running to the local server) to be the externally known one.
- Validate the DMARC DNS record for your domain by sending a test email to autoreply@dmarctest.org (or any other one listed here). My anecdotal experience is that Gmail refused to accept mail (as in “Service unavailable” SMTP rejection) from a domain until I added a DMARC record.
- Check any programs (web applications in particular) than send email, and verify that the envelope sender (MAIL FROM) makes sense (preferably the same as the From header). Best to send mail to some Gmail account, and see what it found the smtp.mailfrom to be. It it’s not a legal domain there, Gmail refuses to accept the mail.
And then make friends with those who have a say on spam detection:
- Query your IP’s status at SPFBL and possibly delist it from the blacklist. It requires a working MTA on the server with postmaster being user on the domain. Spamassassin relies on this service.
- Register the domain at Gmail’s Postmaster Tools to solve delivery problems to Gmail if such occur. I also have a feeling that this might reduce Gmail’s spam rating of the domain (it’s like someone takes responsibility for it).
- Set up a Microsoft account and join SDNS and JMRP. I’ve written a separate post on this, because somehow everything related to Microsoft requires extra effort.
Setting up sendmail
Important general note: Sendmail is made to work sensibly out of the box. It’s clever enough to relay any mail received from localhost to external servers, and not to do that with mails from external connections. Unless you explicitly tell it to become a spam relay. The default configuration files are installed with apt are fine and probably secure.
Sendmail’s internals, on the other hand, with all macros and stuff, is completely horrible.
So the trick is to make minimal changes. There really isn’t much that needs to be done. For a fairly regular mail configuration, there is very little to do (on Debian 8, that is).
So first, install it:
# apt install sendmail
Not just sendmail-bin. It won’t work. Don’t install rmail — it’s for UUCP. Which is ancient and disabled anyhow.
Now changes in the configuration file. By default on Debian 8, sendmail listens to port 25 and 587 at IPv4′s localhost only, and relays mails to external servers as necessary. In order to open ports 25 and 587 for incoming mail to local addresses only from any host, change the line in /etc/mail/sendmail.mc saying
DAEMON_OPTIONS(`Family=inet, Name=MTA-v4, Port=smtp, Addr=127.0.0.1')dnl
DAEMON_OPTIONS(`Family=inet, Name=MSP-v4, Port=submission, M=Ea, Addr=127.0.0.1')dnl
to
DAEMON_OPTIONS(`Family=inet, Name=IPv4-port-25, Port=smtp, M=E')dnl
DAEMON_OPTIONS(`Family=inet, Name=IPv4-port-587, Port=submission, M=E')dnl
Let’s explain the changes:
This is a good time to mention that in sendmailish, it’s as if there were two separate MTA daemons, one for each port. This is the terminology used in the log.
Quote at the top of the file, after the DOMAN() assignment, I added a
define(`confDOMAIN_NAME', `mx.example.com')dnl
This sets the sendmail’s host name, as presented while talking to clients, in particular on HELO/EHLO (there is no need to set the confHELO_NAME / HeloName option). Even if it happens to give the correct name without it, I would set it like this. It’s crucial that identifies itself with the name it’s expected to give, or SPF checks can fail.
And of course, set it to the rDNS of your IP address, not mx.example.com.
Setting up “virtual users”
Having email addresses that don’t match any actual user names on the machine requires defining “virtual users”. But first, it’s essential to tell sendmail to accept emails to other domains than its own. To do this, add one line for each domain. If there are subdomains, add one line for each as well (by default, sendmail wants this explicitly). So I added the following line to /etc/mail/local-host-names:
billauer.co.il
This makes sendmail consider these domains local. An important side effect of this is that now root@billauer.co.il is a legal alias for the local root account. This is an often guessed address by spammers. Handled below.
Then enable virtual users. I put this after the other FEATURE statements in/etc/mail/sendmail.mc:
FEATURE(`virtusertable')dnl
And then run “make” under /etc/mail to update sendmail.cf. And restart sendmail.
Finally, prepare a file with a list of mail addresses, and to which read user they should be routed. First column in the mail address, the second is the target. For simplicity, keep the second column with real local users, but it’s also possible to use other first-column entries as the target. By why messing.
This goes to the file named /etc/mail/virtusertable. This is what it could look like:
someone@billauer.co.il root
not-me@billauer.co.il root
And then call “make” under /etc/mail, which updates /etc/mail/virtusertable.db. There is no need to restart sendmail to make the changes in virtusertable.db take effect.
Mail addresses as well as domains are case-insensitive, of course. But there are no shortcuts with subdomains: Everything after the “@” must match.
Now preventing spammers from sending mails to root@billauer.co.il. Just add this line to /etc/mail/virtusertable:
root@billauer.co.il error:nouser User unknown
This causes sendmail to reject the mail address flat at connection:
Apr 23 08:29:05 sm-mta[12752]: x3N8T4Sn012752: <root@billauer.co.il>... User unknown
But what happens if an internal mail to root is sent, from some cron job, for example? Is it rejected as well? That wasn’t the purpose. Well, on my machine this isn’t a problem, because these mails are sent to root@theserver.billauer.co.il (as defined in /etc/hosts?), so they’re not caught by the virtual user rule above. I don’t know what the result would be without this subdomain thing.
Rejecting IPv6
Why? Because IPv6 is where everything gets messy. Sendmail is already configured not to listen to IPv6, but then, when it’s about to relay to another server, things get ugly. In particular with Gmail, which supplies an IPv6 AAAA DNS entry for its MX servers.
The problem is that sendmail first attempts IPv6, no matter what (see Nov 30 2018 remark after some discussion on this page).It seems to be an Microsoft-style attempt to push IPv6 by forcing everyone to use it. I would have compiled sendmail myself to get rid of this “feature”, but there’s an easier way. So my own attempt to add a
CLIENT_OPTIONS(`Family=inet')dnl
in the sendmail.mc file, turning into
O ClientPortOptions=Family=inet
in sendmail.cf, didn’t make any difference. It should have turned IPv6 off, but didn’t: Sendmail tries IPv6 first, fails, among others because my firewall kills all incoming IPv6 packets, and after a minute goes for IPv4. So why wait?
My solution was to set the firewall to reject the outgoing IPv6 packet, so any TCP connection gets an immediate RST. This doesn’t prevent sendmail from trying IPv6, but makes it clear it’s a no-go. So it doesn’t waste time on it.
These are my firewall rules for that. It’s the OUTPUT rules that I added specially for sendmail:
# ip6tables -A INPUT -i lo -j ACCEPT
# ip6tables -A INPUT -j DROP
# ip6tables -A OUTPUT -o lo -j ACCEPT
# ip6tables -A OUTPUT -j REJECT --reject-with icmp6-addr-unreachable
Add trusted users
If you have CGI scripts on the web server using the -f flag to set the sender, a message of this sort is likely to appear (with sendmail 8.14, at least), unless www-data (or whatever user the web server runs with) is considered trusted:
X-Authentication-Warning: www-data set sender to mailer@mydomain.com using -f
That seems to provoke spam filters. To stop this, add the following line to /etc/mail/submit.mc
FEATURE(use_ct_file)dnl
just before the line saying “FEATURE(`msp’, ” etc.
Then add a list of users that are allowed to use -f flag without the warning to /etc/mail/trusted-users. It’s just a user for each line. So it can be like this:
www-data
eli
and then run “make” to produce submit.cf, and restart sendmail.
Note that this procedure depends strongly on the sendmail version, and this worked on sendmail 8.14.
Reviewing sendmail’s setup
For the real masochists out there, open /etc/sendmail.cf.
- Lines starting with # are comments, of course.
- Lines starting with “O” are options.
- Searching the file for “=/” reveals all file-relating settings (because it’s an assignment followed by the beginning of an absolute path)
Setting up DKIM
I have a separate post on DKIM and friends. Better take a look if this is Chinese to you.
opendkim is made to work sensibly. It is inserted as a mail filter (“Milter”) for sendmail, making it sign outbound messages, and check inbound messages. As with sendmail, there are a few things to set up, and it’s good to go.
Following this guide (more or less). And man opendkim.conf, which is good. First, install:
# apt install opendkim opendkim-tools
Then create the keys:
# mkdir -p opendkim/keys/billauer.co.il
# opendkim-genkey -D /etc/opendkim/keys/billauer.co.il/ -d billauer.co.il -s dkim2019
# chown -R opendkim:opendkim /etc/opendkim/keys/
Now Configuration. The only changes I needed to make from the default files were: Edit /etc/default/opendkim, adding the following line at the end, so a TCP port is opened:
SOCKET="inet:8891@localhost" # listen on localhost port 8891
and since I need to sign multiple domains, added these two lines to /etc/opendkim.conf
KeyTable refile:/etc/opendkim/KeyTable
SigningTable refile:/etc/opendkim/SigningTable
and added the two following files. /etc/opendkim/KeyTable reading
dkim2019._domainkey.billauer.co.il billauer.co.il:dkim2019:/etc/opendkim/keys/billauer.co.il/dkim2019.private
dkim2019._domainkey.example.com example.com:dkim2019:/etc/opendkim/keys/example.com/dkim2019.private
and /etc/opendkim/SigningTable:
*@billauer.co.il dkim2019._domainkey.billauer.co.il
*@example.com dkim2019._domainkey.example.com
For a server whose outbound messages come only from localhost, there’s no need to set neither InternalHosts nor ExternalIgnoreList, as this is the default. These appear in a lot of tutorials.
Finally, make the DKIM a mail filter (“Milter”) on sendmail by adding this line at the end of sendmail.mc (and run “make” + restart sendmail):
INPUT_MAIL_FILTER(`opendkim', `S=inet:8891@127.0.0.1, F=T')
Note the “F=T” part. It will make sendmail refuse to accept mails if the DKIM server isn’t responding properly with a
451 4.3.2 Please try again later
The default is to pass the mail through without the milter if it doesn’t work, which would mean sending unsigned mails without paying attention. The backside of this is that no mail will arrive either if this happens, but at least the delivery won’t fail completely (assuming the issue is resolved within a day or so).
Don’t forget to set up the TXT DNS records with the *.txt files generated with opendkim-genkey. These files are written in zone format for the bind daemon. The actual text is the concatenation of the two strings in quotation marks, after removing these quotation marks. Think of it as a multi-line string in C language.
All done? Use this DKIM validator to see exactly how well it went.
Remove outbound messages from the mailing queue
# mailq
MSP Queue status...
/var/spool/mqueue-client is empty
Total requests: 0
MTA Queue status...
/var/spool/mqueue (2 requests)
-----Q-ID----- --Size-- -----Q-Time----- ------------Sender/Recipient-----------
x22FdWV1010569 1864 Sat Mar 2 10:39 MAILER-DAEMON
(Deferred: Connection timed out with server.com.)
<ze@server.com>
x22FMGAi009668 17 Sat Mar 2 10:23 <this@there.com>
(Deferred: Connection timed out with example.com.)
<eli@example.com>
Total requests: 2
# cd /var/spool/mqueue
# rm *x22FdWV1010569
# rm *x22FMGAi009668
# systemctl restart sendmail
Gmail won’t talk with anyone
Gmail’s server doesn’t respond to a SYN at port 587 or 25, and won’t talk to you unless you have an rDNS. Only after having the rDNS set on the server:
# nc gmail-smtp-in.l.google.com. 25
220 mx.google.com ESMTP y6si2100605wmi.83 - gsmtp
And that’s just the beginning. Without having DMARC set up, it wouldn’t relay my mails. It also seems a very good idea to have the DMARC policy reject emails that don’t meet the criteria for significantly better deliverability with Gmail. Just a hunch, but based upon experience.
More on DMARC here.
Sources of information
Introduction
This is an explicit walkthrough on how a domain name is resolved. Doing the recursion manually, that is.
And then some remarks on the mess with DNS glue records.
Getting the root servers
$ dig NS .
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS .
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59540
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 14
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;. IN NS
;; ANSWER SECTION:
. 49053 IN NS b.root-servers.net.
. 49053 IN NS f.root-servers.net.
. 49053 IN NS a.root-servers.net.
. 49053 IN NS h.root-servers.net.
. 49053 IN NS k.root-servers.net.
. 49053 IN NS l.root-servers.net.
. 49053 IN NS e.root-servers.net.
. 49053 IN NS g.root-servers.net.
. 49053 IN NS j.root-servers.net.
. 49053 IN NS d.root-servers.net.
. 49053 IN NS c.root-servers.net.
. 49053 IN NS i.root-servers.net.
. 49053 IN NS m.root-servers.net.
;; ADDITIONAL SECTION:
a.root-servers.net. 567453 IN A 198.41.0.4
b.root-servers.net. 547997 IN A 199.9.14.201
c.root-servers.net. 314914 IN A 192.33.4.12
d.root-servers.net. 478361 IN A 199.7.91.13
e.root-servers.net. 326962 IN A 192.203.230.10
f.root-servers.net. 514616 IN A 192.5.5.241
g.root-servers.net. 575480 IN A 192.112.36.4
h.root-servers.net. 592754 IN A 198.97.190.53
i.root-servers.net. 596171 IN A 192.36.148.17
j.root-servers.net. 591102 IN A 192.58.128.30
k.root-servers.net. 580970 IN A 193.0.14.129
l.root-servers.net. 523957 IN A 199.7.83.42
m.root-servers.net. 603222 IN A 202.12.27.33
;; Query time: 19 msec
This was a very fast query, because the info is in any DNS’ zone files. This is the piece of info it must know to begin with.
Getting the name servers for .com
So, who are the top level domain servers? I’ll ask the authoritative server directly (this is unnecessary if you just want the answer, so “dig NS com” would have been enough):
$ dig NS com @e.root-servers.net.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS com @e.root-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11329
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;com. IN NS
;; AUTHORITY SECTION:
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
;; ADDITIONAL SECTION:
l.gtld-servers.net. 172800 IN A 192.41.162.30
l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30
b.gtld-servers.net. 172800 IN A 192.33.14.30
b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30
c.gtld-servers.net. 172800 IN A 192.26.92.30
c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30
d.gtld-servers.net. 172800 IN A 192.31.80.30
d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30
e.gtld-servers.net. 172800 IN A 192.12.94.30
e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30
f.gtld-servers.net. 172800 IN A 192.35.51.30
f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30
g.gtld-servers.net. 172800 IN A 192.42.93.30
g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30
a.gtld-servers.net. 172800 IN A 192.5.6.30
a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30
h.gtld-servers.net. 172800 IN A 192.54.112.30
h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30
i.gtld-servers.net. 172800 IN A 192.43.172.30
i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30
j.gtld-servers.net. 172800 IN A 192.48.79.30
j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30
k.gtld-servers.net. 172800 IN A 192.52.178.30
k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30
m.gtld-servers.net. 172800 IN A 192.55.83.30
m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30
;; Query time: 73 msec
;; SERVER: 192.203.230.10#53(192.203.230.10)
The next step: Get the domain’s name server
I just picked one of the name servers from the queries above. Once again, “dig NS google.com” will most likely give the same result.
$ dig NS google.com @j.gtld-servers.net.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS google.com @j.gtld-servers.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37174
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN NS
;; AUTHORITY SECTION:
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.
;; ADDITIONAL SECTION:
ns2.google.com. 172800 IN AAAA 2001:4860:4802:34::a
ns2.google.com. 172800 IN A 216.239.34.10
ns1.google.com. 172800 IN AAAA 2001:4860:4802:32::a
ns1.google.com. 172800 IN A 216.239.32.10
ns3.google.com. 172800 IN AAAA 2001:4860:4802:36::a
ns3.google.com. 172800 IN A 216.239.36.10
ns4.google.com. 172800 IN AAAA 2001:4860:4802:38::a
ns4.google.com. 172800 IN A 216.239.38.10
;; Query time: 74 msec
;; SERVER: 192.48.79.30#53(192.48.79.30)
So what’s the point in asking .com’s servers directly? For one, if I just changed the servers for my domain, and I want to see that change take effect immediately. Besides, I want the advertised TTL values and not those my ISP’s DNS happens to count down:
$ dig NS google.com
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15069
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 9
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN NS
;; ANSWER SECTION:
google.com. 62895 IN NS ns2.google.com.
google.com. 62895 IN NS ns1.google.com.
google.com. 62895 IN NS ns4.google.com.
google.com. 62895 IN NS ns3.google.com.
;; ADDITIONAL SECTION:
ns1.google.com. 170367 IN A 216.239.32.10
ns2.google.com. 240223 IN A 216.239.34.10
ns3.google.com. 238882 IN A 216.239.36.10
ns4.google.com. 248264 IN A 216.239.38.10
ns1.google.com. 170367 IN AAAA 2001:4860:4802:32::a
ns2.google.com. 167252 IN AAAA 2001:4860:4802:34::a
ns3.google.com. 167252 IN AAAA 2001:4860:4802:36::a
ns4.google.com. 159090 IN AAAA 2001:4860:4802:38::a
;; Query time: 21 msec
;; SERVER: 10.2.0.1#53(10.2.0.1)
Final step: Get the address (or something)
This is a bit stupid, but let’s finish up:
$ dig A google.com @ns3.google.com.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> A google.com @ns3.google.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48652
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 300 IN A 216.58.206.14
;; Query time: 92 msec
;; SERVER: 216.239.36.10#53(216.239.36.10)
The local DNS had another answer in its cache. That’s OK. It also bombarded me with some other records, something an authoritative server is much less keen to do on an A query.
$ dig A google.com
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> A google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35893
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 235 IN A 172.217.23.174
;; AUTHORITY SECTION:
google.com. 152204 IN NS ns3.google.com.
google.com. 152204 IN NS ns4.google.com.
google.com. 152204 IN NS ns2.google.com.
google.com. 152204 IN NS ns1.google.com.
;; ADDITIONAL SECTION:
ns1.google.com. 344065 IN A 216.239.32.10
ns2.google.com. 310954 IN A 216.239.34.10
ns3.google.com. 324838 IN A 216.239.36.10
ns4.google.com. 247342 IN A 216.239.38.10
ns1.google.com. 254137 IN AAAA 2001:4860:4802:32::a
ns2.google.com. 345035 IN AAAA 2001:4860:4802:34::a
ns3.google.com. 345503 IN AAAA 2001:4860:4802:36::a
ns4.google.com. 172241 IN AAAA 2001:4860:4802:38::a
;; Query time: 19 msec
;; SERVER: 10.2.0.1#53(10.2.0.1)
Glue records: Place for improvisations
Note that the name servers for google.com are subdomains of google.com. This is fine, because there are glue records in the “Additional Section” the give the IP addresses explicitly. Without these, it would have been impossible to resolve any of google.com’s addresses (it would have got stuck on obtaining the address of e.g. ns1.google.com).
That isn’t so trivial. For example, let’s look at netvision.net.il’s name server record, as reported by the authoritative server for net.il:
$ dig NS netvision.net.il @ns2.ns.il.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS netvision.net.il @ns2.ns.il.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43809
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 3
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;netvision.net.il. IN NS
;; AUTHORITY SECTION:
netvision.net.il. 86400 IN NS dns.netvision.net.il.
netvision.net.il. 86400 IN NS eupop.netvision.net.il.
netvision.net.il. 86400 IN NS nypop.elron.net.
;; ADDITIONAL SECTION:
dns.netvision.net.il. 86400 IN A 194.90.1.5
eupop.netvision.net.il. 86400 IN A 212.143.194.5
;; Query time: 73 msec
;; SERVER: 162.88.57.1#53(162.88.57.1)
Note that there are glue records only for the NS records that belong to netvision.net.il. The nypop.elron.net. server (extra backup?) doesn’t have a glue record. It could have, as a DNS is allowed to answer for another domain in the special case of a glue record (see RFC 1033).
OK, so how do you resolve nypop.elron.net? You ask the nameserver for elron.net, of course! Let’s ask the authoritative server for .net:
$ dig NS @a.gtld-servers.net. elron.net.
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS @a.gtld-servers.net. elron.net.
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27469
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;elron.net. IN NS
;; AUTHORITY SECTION:
elron.net. 172800 IN NS dns.netvision.net.il.
elron.net. 172800 IN NS nypop.netvision.net.il.
;; Query time: 82 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
Oops. That went back to netvision.net.il. So it doesn’t get us out of the loop. But here comes the funny part: Ask Netvision’s own DNS the same question:
$ dig NS elron.net. @dns.netvision.net.il
; <<>> DiG 9.11.3-1ubuntu1.2-Ubuntu <<>> NS elron.net. @dns.netvision.net.il
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10023
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 3
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c38902309e53e8ef454a61b25c98a0863c28b2c747ce090a (good)
;; QUESTION SECTION:
;elron.net. IN NS
;; ANSWER SECTION:
elron.net. 600 IN NS nypop.elron.net.
elron.net. 600 IN NS dns.netvision.net.il.
;; ADDITIONAL SECTION:
dns.netvision.net.il. 86400 IN A 194.90.1.5
nypop.elron.net. 600 IN A 199.203.1.20
;; Query time: 23 msec
;; SERVER: 194.90.1.5#53(194.90.1.5)
Cute, isn’t it? Not only is it a different answer, but the glue records are there. So if you make these queries from within Netvision’s infrastructure, you’re blind to the lack of glue records.
And these are the records of one of Israel’s largest ISPs.
nslookup
OK, this is soooo unrelated, and still, I thought I should mention nslookup as an alternative for dig. The former creates output that is more human readable, the latter more like zone file records. Pick your poison.
$ nslookup -type=NS google.com
Server: 10.2.0.1
Address: 10.2.0.1#53
Non-authoritative answer:
google.com nameserver = ns1.google.com.
google.com nameserver = ns3.google.com.
google.com nameserver = ns4.google.com.
google.com nameserver = ns2.google.com.
Authoritative answers can be found from:
ns1.google.com internet address = 216.239.32.10
ns2.google.com internet address = 216.239.34.10
ns3.google.com internet address = 216.239.36.10
ns4.google.com internet address = 216.239.38.10
ns1.google.com has AAAA address 2001:4860:4802:32::a
ns2.google.com has AAAA address 2001:4860:4802:34::a
ns3.google.com has AAAA address 2001:4860:4802:36::a
ns4.google.com has AAAA address 2001:4860:4802:38::a
and then, let’s ask the authoritative server the same question?
$ nslookup -type=NS google.com. j.gtld-servers.net.
Server: j.gtld-servers.net.
Address: 192.48.79.30#53
Non-authoritative answer:
*** Can't find google.com.: No answer
Authoritative answers can be found from:
google.com nameserver = ns2.google.com.
google.com nameserver = ns1.google.com.
google.com nameserver = ns3.google.com.
google.com nameserver = ns4.google.com.
ns2.google.com has AAAA address 2001:4860:4802:34::a
ns2.google.com internet address = 216.239.34.10
ns1.google.com has AAAA address 2001:4860:4802:32::a
ns1.google.com internet address = 216.239.32.10
ns3.google.com has AAAA address 2001:4860:4802:36::a
ns3.google.com internet address = 216.239.36.10
ns4.google.com has AAAA address 2001:4860:4802:38::a
ns4.google.com internet address = 216.239.38.10
So it’s like more readable, but I miss those TTL records.
Gmail is definitely the leader in the field of email services, and their spam filter is actually very good. From my own experience with setting up a mail server, I can tell that it’s not all that easy to make Gmail’s incoming mail servers even talk with you. So the larger part of spammer don’t even get the chance to suggest their piece of spam to Gmail.
The problem is that every now and then (quite rare, but still), an important email is classified as spam. Since I’m fetching the emails with fetchmail, and apply my own spam filter, I prefer getting them all. Messages that are classified as spam will not be available in the POP3 session that fetchmail makes.
How to do it: Simply add a mail filter. Click the upper-right gear, select “Settings” and choose the tab for filters. Then create a new mail filter with a condition that all email meet (I picked smaller than 100 MB) and check “Never send it to Spam”. That’s it.
Intro
Whether you just want your non-Gmail personal email to get through, or you have a website that produces transactional emails (those sent by your site or web app), there’s a long fight with spam filters ahead.
The war against unsolicited emails will probably go on as long as email is used, and it’s an ongoing battle where one leak is sealed and another is found. Those responsible for mail delivery constantly tweak their spam detectors’ parameters to minimize complaints. There are no general rules for what is detected as spam and what isn’t. What passes Gmail’s tests may very well fall on Security Industries’ mail server and vice versa. Each have their own experience.
But you want your message to reach them all. Always.
This is a guide to the main ideas and concepts behind the trio of mechanisms mentioned in the title. The purpose is to focus on the delicate and sometimes crucial details that are often missed in howtos everywhere. And also try to understand the rationale behind each mechanism, even though it might not be relevant when spam detector X is tuned to achieve the best results, given a current flow of spam mails with a certain pattern.
Howtos usually tell you to employ a DKIM signing software on the mail server, and make SPF and DKIM DNS records for “your domain”. Which one? Not necessarily trivial, as discussed below. And then possibly add a DMARC record as well. Will it really help? Also discussed.
Here’s the thing: Employing these elements will most likely do something good, even if you get it wrong. Setting up things without understanding what you’re doing can solve an immediate problem. This post focuses on understanding the machinery, so the best possible setting can be achieved.
Get your tie knot right.
Rationale: Building a reputation for the domain
There are different ideas behind each of the trio’s mechanisms, but there’s one solid idea behind them all: The reputation of a domain name.
If you’re a spammer, you can’t send thousands of emails that are linked to a domain name without wrecking its reputation rather quickly. So let’s make sure each domain name’s owner stands behind the mails sent on its behalf, and maintains its reputation. This requires a way to tell whether this owner really sent each mail, and not just a spammer abusing it. SPF and DKIM supply these mechanisms.
As domain names are rather cheap today, spammers may very well buy a new domain name for each shower of emails. But you can’t do that over time. This is why some spam filters don’t give any points to passing domain-related tests, even if they check them. However large email services (Gmail in particular) do seem to collect statistics on domains, and treat new emails according to existing reputation.
DMARC takes this one step further, and allows a domain name owner to prevent the delivery of emails that weren’t sent on its behalf. It also puts the focus on the the sender given in the”From:” header, instead of other domains, which SPF and DKIM might relate to. This seems to be important to prevent spammers from using your domain and wreck its reputation. More about that below.
Despite all said above, I still get spam messages (of the random recipient type) with this trio perfectly set up. But they’re relatively rare.
The trio in short
These three techniques are fundamentally different in what they do. In brief for now, in more detail further below:
- SPF: Defines the set of server IP addresses that are authorized to use a domain name to identify itself (HELO/EHLO) and/or the mail’s sender (MAIL FROM) in the SMTP exchange. Note that this doesn’t directly relate to the “From:” mail header, even though it does in many practical cases.
- DKIM: A method to publish a public key in a DNS record for the digital signature of some parts of an email message, so this signature can be verified by any recipient. The domain name of this DNS record, which is given explicitly in the signature, doesn’t need to have any relation to the mail’s author, sender or any relaying server involved (even though it usually has). It’s just a placeholder for the accumulating reputation of mails that are signed with it.
- DMARC: A mechanism to prevent the domain name from being abused by spammers. It basically tells the recipient than an email with a certain Author Domain (as it appears in “From:”) should pass an SPF and/or DKIM test, and what to do if not.
In essence, SPF authenticates the use of some mail relay servers, DKIM authenticates the message carrying its signature, and DMARC says what to do if the authentication(s) fail.
The DNS records
All three techniques rely on a DNS lookups for a TXT entry, which has the domain name included (let’s say we have example.com.):
- SPF records are found as a TXT record for the domain itself (that is, example.com.).
- DKIM records are the TXT records for the “selector._domainkey” subdomain, where “selector” is given in the mail message’s DKIM header. So it’s like default._domainkey.example.com (for selector=default).
- DMARC record are the TXT entry for the “_dmarc” subdomain (i.e. _dmarc.example.com).
So it’s crucial which domain it is that the spam filter software considers to be “the domain”. Spoiler: DKIM and DMARC have this sorted out nicely. It’s SPF that is tricky.
Note that given an email message, the recipient can easily check whether it has SPF and DMARC records, but (without DMARC) it can’t know if there’s a relevant DKIM record available, because of the selector part. Consequently, adding a DKIM record and signing only part of the emails won’t backlash on those that aren’t signed.
Which domain is “the domain”?
Quite often, guides in these topics just say “the domain”, making it sound as if there’s only one domain involved. In fact, there are several to be aware of.
Let’s say that myself@example.com sends a mail by connecting to its ISP’s mail server mx.isp.com, which in turn relays it to the destination mail server. We then have four different domains involved.
- The domain of the author, appearing in the From header, shown to the human recipient as the sender. example.com in this case.
- The “envelope sender”, appearing in the MAIL FROM part of the SMTP conversation of the relay transmission. This could be example.com (the simple approach), but also something like bounce.isp.com. This is because the envelope sender is the bounce address, and some mail relays make up some kind of bogus bounce address so they can track the bouncing mails.
- The domain used in the HELO/EHLO part of the SMTP conversation of the relay transmission. Probably something like mail23.isp.com, as the ISP has many servers for relaying out.
- The rDNS domain entry of the IP address of the sender on the relay transmission. If this entry doesn’t exist, or isn’t exactly as the HELO/EHLO domain, hang the postmaster. Some mail servers won’t even talk with you unless they match.
I use the term “relay transmission” for the connection between two mail servers: Going from the server that accepted your message for transmission when you pressed “Send” to the server that holds the mail account of the mail’s recipient (i.e. destination of the MX record of the recipient’s full domain).
But oops. Mails are often relayed more than once before reaching their final station. Except for the first item in the list above, the domains are different on each such transmission. Which one counts? When does it count?
Luckily, this dilemma is pretty much limited to SPF. And with DMARC, it’s nonexistent.
SPF
At times, people just add an SPF record for their mail address’ domain with their relay servers’ IP range, and think they’ve covered themselves SPF-wise. Sometimes they did, and sometimes they didn’t. No escape from the gory details.
If you’re not familiar with the HELO/EHLO and MAIL FROM: SMTP tokens, I warmly suggest taking a quick look on another post of mine. It’s nearly impossible to understand SPF without it.
The SPF mechanism is quite simple: The server that receives the email looks up the TXT DNS record(s) for the domain name given in the envelope sender, that is in “MAIL FROM:”. If an SPF record exists, it checks if the IP address of the sender is in the allowed set, and if so, the SPF test is passed.
The domain name that is checked is the “domain portion” of the “MAIL FROM” identity (see RFC7208 section 4.1), or in other words, everything after the “@” character of the MAIL FROM. Or so it’s commonly understood: The RFC doesn’t define this term.
The receiver is likely to perform the same check on the HELO/EHLO identification of the sender. In fact, RFC7208 section 2.3 recommend performing it even before the MAIL FROM check. The SPF test will pass if either of the HELO/EHLO or the MAIL FROM check passes (the RFC doesn’t say this explicitly, but it’s clear from the argument for beginning with the more definite HELO/EHLO check).
This is important: Any mail server can ensure all mails that go through it pass the (non-DMARC) SPF test, just by having a DNS record on its full HELO/EHLO domain name. It’s silly not to have one. So if you’re setting up a mail server called mx.theisp.com, be sure to add SPF records for mx.theisp.com, allowing the IP of that server. This SPF test won’t count for DMARC purposes, but the “Received-SPF: pass” line in among the mail headers surely doesn’t hurt.Except for when DMARC is applied in one of its enforcing modes, there is no clear rule on what to do if this test fails or passes with one of the SMTP tokens or both. This is raw material for the spam detection software.
It’s however important to note that it’s perfectly normal that envelope address is made up completely by the mail relay, because it functions as a bounce address. So an email sent from myself@example.com may have the same envelope address, but it’s also perfectly normal that the MAIL FROM: would be bounce-3242535@bounce.isp.com. This allows the ISP to detect massive bouncing of emails, and possibly do something about it. In this case, the relaying server’s domain can be used to pass the (non-DMARC) SPF test instead.
Well, with the reputation per domain rationale, it actually does makes sense. But with DMARC, this won’t cut. The SPF record must belong to the “From:” sender. See below.
Now, the formal rules are nice, but if you just wrote a spam filter, would you check for the SPF record of the “From:” sender’s domain, even though it’s not really relevant according to the RFC? Of course you would. If the domain owner of the Author Address has given permission for a server to relay emails on its behalf, it’s a much stronger indication. So it’s probably a good idea to make such a record, even if makes no sense directly. And it makes you better prepared for DMARC.
As a matter of fact, it’s recommended to add SPF records for any domain and subdomain that may somehow appear in the mail, to the extent possible, of course. A DNS record is cheap, and you never know if a spam detector expects it to be there, whether it should or not.
Bottom line: We don’t really know how many points spam filter X gives an SPF record of this type or another. It depends on the history of previous spam. So try to cover all options, even those that aren’t required per RFC.
Information on setting up an SPF record is all around the web. I suggest starting with Wikipedia’s great entry and if you want to be accurate about it, in RFC7208.
DKIM
This is easiest explained through a real signature, taken from the header of a real mail message:
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com;
s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=vx0LAOXz3Mr8zS/Jy2ayKOep6NlflK3t+BpJyi78v9A=;
b= [ ... ]
To verify this signature, a lookup for the DNS TXT entry for 20161025._domainkey.gmail.com is made.Note that except for the _domainkey part, the domain comprises of the s= (selector) and the d= (domain) assignments in the signature. The answer should contain an RSA public key for verifying that the hash of some selected headers (selected by h= ) is indeed signed by the blob in the b= assignment. That’s it. If the signature is OK, the DKIM test is passed.
Note that DKIM doesn’t (usually) sign the message body, so signing a message with DKIM doesn’t make you accountable for its content, only the fact that you sent that mail.
Also note that no other domains, that are related to the email, make any difference for passing the DKIM test itself. Not the sender’s not the mail relays’, nothing. Passing the DKIM test just means that the signing domain (gmail.com) has signed this message (actually, some of its headers) and therefore puts its own reputation on it. It doesn’t say anything on who sent the message.
The common practice is however that the signing domain is the From: header domain. Probably because DMARC can’t be applied otherwise, maybe also because the goal is to impress spam filters. Passing the DKIM test is nice formally, but if the spam filter thinks it fishy, it can backlash.
Another reason, from RFC4871, section 6.3: “If the message is signed on behalf of any address other than that in the From: header field, the mail system SHOULD take pains to ensure that the actual signing identity is clear to the reader.” Yeah right. I’ve seen Gmail verifying a DKIM signature of a domain which had nothing to do with anything in that message, surely not the sender. It just went “dkim=pass”.As for Spamassassin, it doesn’t care much about DKIM so far. Probably for good reasons, as I get a lot of spam messages with the DKIM signature perfectly done. So as of now, passing the DKIM test doesn’t change the score. Or more precisely, the existence of a DKIM signature increases the spam score (more spammy) with 0.1, but if the signature is correct, the score is reduced with 0.1. So we’re back to zero. If the signature belongs to the author (matching From: domain) the score is reduced (i.e. towards non-spam) by 0.1. All in all, a DKIM signature wins a score of 0.1 on Spamassassin. May not seem to be worth the efforts, but Spamassassin is not the only filter in the world. And it may change over time.
Finally a question: The MUA (e.g. Thunderbird) is allowed to put a DKIM signature, which would actually make sense: It allows a human end user to sign the emails directly, with no need for anything special on the relaying infrastructure. And there’s no problem with multiple RSA key pairs for multiple users of the domain, since the “s=” selector allows a virtually unlimited number of DKIM DNS records. Why there isn’t a plugin for at least Thunderbird is unclear to me. Maybe the answer lies in Spamassassin’s indifferent response to it.
DMARC
Suppose that you own company Example Ltd. with domain example.com, and you’ve decided that all mails from (as in header From:) that domain will be DKIM signed. Now some spam mail arrives from someone else, without a DKIM signature and fails the SPF test. But the recipient has no way to tell that it should pass such tests.
DMARC is the mechanism that tells the recipient what to expect, and what to do if the expectation isn’t met. This allows the owner of the domain to ensure only mails arriving from its own machines are accepted. Spam pretending to come from its domain is dropped.
This is what Gmail did to force emails from all its users (i.e. having a gmail.com address) to be relayed through their servers only. The TXT for _dmarc.gmail.com goes:
"v=DMARC1; p=none; sp=quarantine; rua=mailto:mailauth-reports@google.com"
In other words, if it isn’t proven to come from Gmail’s server, hold the message. Most servers just junk it.
And now to how the test is done. Spoiler II: DMARC isn’t interested in a domain test if it isn’t tightly linked with the “From:” header’s domain. Or as they call it: Aligned with the RFC5322.From field. This is huge difference.
Let’s take it directly from RFC7489, section 4.2:
A message satisfies the DMARC checks if at least one of the supported authentication mechanisms:
- produces a “pass” result, and
- produces that result based on an identifier that is in alignment, as defined in Section 3.
The “supported authentication mechanisms” for DMARC version 1 are SPF and DKIM, as listed in section 4.1 of the same RFC.
The first thing we learn is that it’s enough to pass one of SPF or DKIM. No need to have both for passing DMARC.
Second, the term “is in alignment” above. It’s defined in the RFC itself, and essentially means that the domain for which the SPF or DKIM passed is the same as the one in the From: header, possibly give or take subdomains. The only reason they didn’t just say that the domains must be equal is because of the possibility of “relaxed mode”, allowing an email from myself@mysubdomain.example.com to be approved by passing tests with the example.com domain. This is what “being in alignment” means in relaxed mode. In “strict mode” alignment occurs only when they’re perfectly equal.
If the email passes the DMARC test, there isn’t much to fuss about. If it fails, the decision what to do depends on the policy, as given in the relevant domain. Which, according to RFC7489 section 3 is: “Author Domain: The domain name of the apparent author, as extracted from the RFC5322.From field”. And then in section 4.3, item 7: “The DMARC module attempts to retrieve a policy from the DNS for that domain” (referring to the Author Domain).
So it’s a DNS query for the TXT record of the From: domain, with the “_dmarc” subdomain prepended. As in the example above for gmail.com.
Finally, a tricky point. If a mail server, for which the SPF test is made, didn’t use the Author Domain in its MAIL FROM nor in the HELO/EHLO, the SPF test is worthless for DMARC purposes. It’s however quite tempting to check the Author Domain for its SPF record nevertheless. I mean, if the Author Domain allows the IP address of the mail relay server, isn’t it good enough to pass a DMARC test? Doing this goes against the SPF’s RFC, and isn’t mentioned in any way in DMARC’s RFC. But it makes a lot of sense. I won’t be surprised if it’s common practice already.
Will DMARC make my email delivery better?
TL;DR: Surprisingly enough, yes.
The irony about DMARC is that it bites on the spam messages, and does very little on the legit ones. After all, if an email passed both the SPF and DKIM tests on the Author Domain, what is there left to say?
And if the same email passed only one of the tests, why would a DMARC record add reassurance?
Of course, if you want to fake mails pretending to be you, definitely apply DMARC.
But once again, noone knows how spam filter X behaves. Maybe someone found out that DMARC signed domains carry less spam, and tuned the filter in favor of them. And maybe the rejection of spam mails thanks to the DMARC record helped with the domain’s spam statistics. Even though I would expect any machine that maintains statistics to count the emails that pass SPF / DKIM tests separately.
And here comes the big surprise. Gmail refused to accept messages from my server until I added a DMARC record. Once I did it, I was all welcome. It makes no sense, but somehow, Google seems to like the very existence of a DMARC record. Maybe a coincidence, most likely not. So do yourself a favor, at the very least add a TXT record to _dmarc.yourdomain.com:
v=DMARC1; p=none; sp=none; ruf=mailto:mailreports@yourdomain.com
This record tells the recipient to do nothing with a mail message that fails the DMARC test, so it’s harmless. But it will send an email to tell you about it to the email address given. Which can be useful in itself.
Actually, make that “p=reject” and “sp=reject” as soon as you’re sure it won’t bite back on your own mails. Once again, it seems to improve deliverability with Gmail, and possible also the overall statistics: Even through spam emails don’t meet the DMARC criterion, if they’re not discarded, they might circulate and wreck the domain’s reputation. So kill them while they’re small.
Conclusion
There might be official rules for entering a club, but in the end of the day, you can’t know what the doorkeeper looks at. So try to get everything as tidy as possible, and hope you won’t be mistaken for the bad guys.
And don’t wait for the first time you won’t be let in. It might be too late to fix it then.
This is a quick overview of the parts of an SMTP session that are relevant to SPF and mail server setup.
Just a sample SMTP session
For a starter, this is what an ESMTP session between two mail servers talking on port 25 can look like (shamelessly copied from this post, which also shows how I obtained it).
"eli@picky.server.com" <eli@picky.server.com>... Connecting to [127.0.0.1] via relay...
220 theserver.org ESMTP Sendmail 8.14.4/8.14.4; Sat, 18 Jun 2016 11:05:26 +0300
>>> EHLO theserver.org
250-theserver.org Hello localhost.localdomain [127.0.0.1], pleased to meet you
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE
250-DSN
250-ETRN
250-DELIVERBY
250 HELP
>>> MAIL From:<eli@theserver.org> SIZE=864
250 2.1.0 <eli@theserver.org>... Sender ok
>>> RCPT To:<eli@picky.server.com>
>>> DATA
250 2.1.5 <eli@picky.server.com>... Recipient ok
354 Enter mail, end with "." on a line by itself
>>> .
250 2.0.0 u5I85QQq030607 Message accepted for delivery
"eli@picky.server.com" <eli@picky.server.com>... Sent (u5I85QQq030607 Message accepted for delivery)
Closing connection to [127.0.0.1]
>>> QUIT
221 2.0.0 theserver.org closing connection
HELO / EHLO
This is the first thing the client says after the server More precisely, it says something like
HELO mail.example.com
This self-introduction is important: The server knows your IP, and probably makes a quick rDNS check on it, to see if you’re making this domain up. So the domain given in HELO must be the same as in the rDNS record. Exactly.
It doesn’t matter if this domain has nothing to do with the domain of the actual From-sender. Or any other domain, for that matter. Relaying emails is normal. Not having the rDNS set up properly shouldn’t be.
Rumor has it that most mail servers will accept the message even if there’s no match, or even if there’s no rDNS record at all. And I’ve seen plenty of these myself. I’ve also had my server rejected because of this. It’s losing points on being lazy.
EHLO is like HELO, but indicates the start of an ESMTP session. For the purpose of the domain, it’s the same thing.
MAIL FROM:
After the HELO introduction (and possibly some other stuff), the client goes something like:
MAIL FROM:<myself@example.com>
The email address given is often referred to as the envelope sender, envelope-from or smtp.mailfrom.
In its simplest form (and as originally intended), this is the sender of the mail, copied from the “From:” header, as presented to the end user. But even more important, this is the address for bouncing the mail if it’s undeliverable. So one common trick, mostly used by mass relays, is to assign a long and tangled MAIL FROM: bounce addresses from which the relaying server can identify the message better.
The envelope sender appears as the “Return-Path:” header in mail messages as they are reach mailing boxes. Along the Received list in the mail headers, “envelope-from” tags often appear, indicating the envelope sender of the relevant leg.
This way or another, if you’re into SPF, then the SPF record must match the envelope sender, and not necessarily the From: sender. Even though it’s a good idea to cover both. Mail relays are a bit messy on what they check.
VRFY and EXPN
VRFY allows the client to check whether an email address is valid or not on the server. If it is valid, the server responds with a full address of the user.
This allows the client to scan through a range of addresses, and find one that is a valid recipient. Excellent for spammers, which is why this function is commonly unavailable today. For example:
VRFY eli@billauer.co.il
252 Administrative prohibition
on another machine:
VRFY eli@billauer.co.il
252 2.5.2 Cannot VRFY user; try RCPT to attempt delivery (or try finger)
EXPN is more or less the same, just with mailing lists: The client gives the name of the list, and gets the list of users. The common practice is not allowing this command. Even not those who allow VRFY despite its issues with spam.
If you’re setting up a mail server, disable this. It’s often enabled by default.
TL;DR: SELECT queries in Perl for numerical columns suddenly turned to zeros after a software upgrade.
This is a really peculiar problem I had after my web hosting provider upgraded some database related software on the server: Numbers that were read with SELECT queries from the database were suddenly all zeros.
Spoiler: It’s about running Perl in Taint Mode.
The setting was DBD::mysql version 4.050, DBI version 1.642, Perl v5.10.1, and MySQL Community Server version 5.7.25 on a Linux machine.
For example, the following script is supposed to write the number of lines in the “session” table:
#!/usr/bin/perl -T -w
use warnings;
use strict;
require DBI;
my $dbh = DBI->connect( "DBI:mysql:mydb:localhost", "mydb", "password",
{ RaiseError => 1, AutoCommit => 1, PrintError => 0,
Taint => 1});
my $sth = $dbh->prepare("SELECT COUNT(*) FROM session");
$sth->execute();
my @l = $sth->fetchrow_array;
my $s = $l[0];
print "$s\n";
$sth->finish();
$dbh->disconnect;
But instead, it prints zero, even though there are rows in the said table. Turning off taint mode by removing the “-T” flag in the shebang line gives the correct output. Needless to say, accessing the database with the “mysql” command-line utility client gave the correct output as well.
This is true for any numeric readout from this MySQL wrapper. This is in particular problematic when an integer is used as a user ID of a web site, and fetched with
my $sth = db::prepare_cached("SELECT id FROM users WHERE username=? AND passwd=?");
$sth->execute($name, $password);
my ($uid) = $sth->fetchrow_array;
$sth->finish();
If the credentials are wrong, $uid will be undef, as usual. But if any valid user gives correct credentials, it’s allocated user number 0. Which I was cautious enough not to allocate as the site’s supervisor, but that’s actually a common choice (what’s the UID of root on a Linux system?).
A softer workaround, instead of dropping the “-T” flag, is to set the TaintIn flag in the DBI->connect() call, instead of Taint. The latter stands for TaintIn and TaintOut, and so this fix effectively disables TaintOut, hence tainting of data from the database is disabled. And in this case, disabling tainting of this data also skips the zero-value bug. This leaves all other tainting checks in place, in particular that of data supplied from the network. So not enforcing sanitizing data from the database is a small sacrifice (in particular if the script already has mileage running with the enforcement on).
And in the end I wonder if I’m the only one who uses Perl’s tainting mechanism. I mean, if there are still (2019) advisories on SQL injections (mostly PHP scripts), maybe people just don’t care much about things of this sort.
I suddenly got the following line in public_html/error_log:
[06-Feb-2019 17:51:53] PHP Deprecated: Automatically populating $HTTP_RAW_POST_DATA is deprecated and will be removed in a future version. To avoid this warning set 'always_populate_raw_post_data' to '-1' in php.ini and use the php://input stream instead. in Unknown on line 0
So I took a closer look on the logs:
58.221.58.19 - - [06/Feb/2019:17:51:50 -0500] "POST /%25%7b(%23dm%3d%40ognl.OgnlContext%40DEFAULT_MEMBER_ACCESS).(%23_memberAccess%3f(%23_memberAccess%3d%23dm)%3a((%23container%3d%23context%5b%27com.opensymphony.xwork2.ActionContext.container%27%5d).(%23ognlUtil%3d%23container.getInstance(%40com.opensymphony.xwork2.ognl.OgnlUtil%40class)).(%23ognlUtil.getExcludedPackageNames().clear()).(%23ognlUtil.getExcludedClasses().clear()).(%23context.setMemberAccess(%23dm)))).(%23res%3d%40org.apache.struts2.ServletActionContext%40getResponse()).(%23res.addHeader(%27eresult%27%2c%27struts2_security_check%27))%7d/ HTTP/1.1" 500 2432 "-" "Auto Spider 1.0"
58.221.58.19 - - [06/Feb/2019:17:51:51 -0500] "POST / HTTP/1.1" 200 4127 "-" "Auto Spider 1.0"
58.221.58.19 - - [06/Feb/2019:17:51:52 -0500] "POST / HTTP/1.1" 200 4127 "-" "Auto Spider 1.0"
58.221.58.19 - - [06/Feb/2019:17:51:53 -0500] "POST / HTTP/1.1" 200 4131 "-" "Auto Spider 1.0"
58.221.58.19 - - [06/Feb/2019:17:52:14 -0500] "POST / HTTP/1.1" 200 4129 "-" "Auto Spider 1.0"
58.221.58.19 - - [06/Feb/2019:17:52:15 -0500] "POST / HTTP/1.1" 200 4130 "-" "Auto Spider 1.0"
58.221.58.19 - - [06/Feb/2019:17:52:18 -0500] "POST / HTTP/1.1" 200 4130 "-" "Auto Spider 1.0
Googling around for the first entry, which is obviously some kind of attack (partly because it’s a POST coming from nowhere), it looks like an attempt to exploit the Struts Remote Code Execution Vulnerability based upon this proof of concept for CVE-2017-9791.
The unpleasant thing to note is that the error message doesn’t relate to the first POST request, but to a later one. So maybe this attack went somewhere? Anyhow, it’s not my server, so I can’t do much about Apache’s configuration. Besides, other information I have seems to indicate that the attack didn’t manage to do anything.
Guess it’s just one of a gazillion attacks that go unnoticed, just this one created a line in my error log.
Introduction
It all began when I noted that my media center Linux machine (Linux Mint 18.1, Serena) finished a TV recording a bit earlier than expected. Logging in and typing “date” I was quite surprised to find out that the time was off by half a minute.
The first question that comes to mind is why the time synchronization didn’t work. The second is, if it didn’t work, how come I hadn’t noted this issue earlier? The computer is in use as a media center for little less than two years.
What happened
It turns out (and it wasn’t easy to tell) that the relevant daemon was ntpd.
So what’s up, ntp?
$ systemctl status ntp
● ntp.service - LSB: Start NTP daemon
Loaded: loaded (/etc/init.d/ntp; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2018-12-19 12:38:06 IST; 1 months 7 days ag
Docs: man:systemd-sysv-generator(8)
Process: 1257 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)
Process: 1385 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)
Dec 19 12:38:06 tv systemd[1]: Starting LSB: Start NTP daemon...
Dec 19 12:38:06 tv ntp[1385]: * Starting NTP server ntpd
Dec 19 12:38:06 tv ntp[1385]: ...done.
Dec 19 12:38:06 tv systemd[1]: Started LSB: Start NTP daemon.
Dec 19 12:38:06 tv ntpd[1398]: proto: precision = 0.187 usec (-22)
Dec 19 12:38:08 tv systemd[1]: Started LSB: Start NTP daemon.
Looks fairly OK. Maybe the logs can tell something?
$ journalctl -u ntp
Dec 19 12:38:02 tv systemd[1]: Stopped LSB: Start NTP daemon.
Dec 19 12:38:02 tv systemd[1]: Starting LSB: Start NTP daemon...
Dec 19 12:38:02 tv ntp[1055]: * Starting NTP server ntpd
Dec 19 12:38:02 tv ntpd[1074]: ntpd 4.2.8p4@1.3265-o Wed Oct 5 12:34:45 UTC 2016 (1): Starting
Dec 19 12:38:02 tv ntpd[1076]: proto: precision = 0.175 usec (-22)
Dec 19 12:38:02 tv ntp[1055]: ...done.
Dec 19 12:38:02 tv systemd[1]: Started LSB: Start NTP daemon.
Dec 19 12:38:02 tv ntpd[1076]: Listen and drop on 0 v6wildcard [::]:123
Dec 19 12:38:02 tv ntpd[1076]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Dec 19 12:38:02 tv ntpd[1076]: Listen normally on 2 lo 127.0.0.1:123
Dec 19 12:38:02 tv ntpd[1076]: Listen normally on 3 lo [::1]:123
Dec 19 12:38:02 tv ntpd[1076]: Listening on routing socket on fd #20 for interface updates
Dec 19 12:38:03 tv ntpd[1076]: error resolving pool 0.ubuntu.pool.ntp.org: Temporary failure in name resolution (-3)
Dec 19 12:38:04 tv ntpd[1076]: error resolving pool 1.ubuntu.pool.ntp.org: Temporary failure in name resolution (-3)
Dec 19 12:38:05 tv ntpd[1076]: error resolving pool 2.ubuntu.pool.ntp.org: Temporary failure in name resolution (-3)
Dec 19 12:38:06 tv systemd[1]: Stopping LSB: Start NTP daemon...
Dec 19 12:38:06 tv ntp[1257]: * Stopping NTP server ntpd
Dec 19 12:38:06 tv ntp[1257]: ...done.
Dec 19 12:38:06 tv systemd[1]: Stopped LSB: Start NTP daemon.
Dec 19 12:38:06 tv systemd[1]: Stopped LSB: Start NTP daemon.
Dec 19 12:38:06 tv systemd[1]: Starting LSB: Start NTP daemon...
Dec 19 12:38:06 tv ntp[1385]: * Starting NTP server ntpd
Dec 19 12:38:06 tv ntp[1385]: ...done.
Dec 19 12:38:06 tv systemd[1]: Started LSB: Start NTP daemon.
Dec 19 12:38:06 tv ntpd[1398]: proto: precision = 0.187 usec (-22)
Dec 19 12:38:08 tv systemd[1]: Started LSB: Start NTP daemon.
Hmmm… There is some kind of trouble there, but it was surely resolved. Or? In fact, there was no ntpd process running, so maybe it just died?
Let’s try to restart the daemon, and see what happens. As root,
# systemctl restart ntp
after which the log went
Jan 26 20:36:46 tv systemd[1]: Stopping LSB: Start NTP daemon...
Jan 26 20:36:46 tv ntp[32297]: * Stopping NTP server ntpd
Jan 26 20:36:46 tv ntp[32297]: start-stop-daemon: warning: failed to kill 1398: No such process
Jan 26 20:36:46 tv ntp[32297]: ...done.
Jan 26 20:36:46 tv systemd[1]: Stopped LSB: Start NTP daemon.
Jan 26 20:36:46 tv systemd[1]: Starting LSB: Start NTP daemon...
Jan 26 20:36:46 tv ntp[32309]: * Starting NTP server ntpd
Jan 26 20:36:46 tv ntp[32309]: ...done.
Jan 26 20:36:46 tv systemd[1]: Started LSB: Start NTP daemon.
Jan 26 20:36:46 tv ntpd[32324]: proto: precision = 0.187 usec (-22)
Jan 26 20:36:46 tv ntpd[32324]: Listen and drop on 0 v6wildcard [::]:123
Jan 26 20:36:46 tv ntpd[32324]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 2 lo 127.0.0.1:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 3 enp3s0 10.1.1.22:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 4 lo [::1]:123
Jan 26 20:36:46 tv ntpd[32324]: Listen normally on 5 enp3s0 [fe80::f757:9ceb:2243:3e16%2]:123
Jan 26 20:36:46 tv ntpd[32324]: Listening on routing socket on fd #22 for interface updates
Jan 26 20:36:47 tv ntpd[32324]: Soliciting pool server 118.67.200.10
Jan 26 20:36:48 tv ntpd[32324]: Soliciting pool server 210.23.25.77
Jan 26 20:36:49 tv ntpd[32324]: Soliciting pool server 211.233.40.78
Jan 26 20:36:50 tv ntpd[32324]: Soliciting pool server 43.245.49.242
Jan 26 20:36:30 tv ntpd[32324]: Soliciting pool server 45.76.187.173
Jan 26 20:36:30 tv ntpd[32324]: Soliciting pool server 46.19.96.19
Jan 26 20:36:31 tv ntpd[32324]: Soliciting pool server 210.173.160.87
Jan 26 20:36:31 tv ntpd[32324]: Soliciting pool server 119.28.206.193
Jan 26 20:36:49 tv ntpd[32324]: Soliciting pool server 133.243.238.244
Jan 26 20:36:49 tv ntpd[32324]: Soliciting pool server 91.189.89.199
Aha! So this is what a kickoff of ntpd should look like! Clearly ntpd didn’t recover all that well from the lack of internet connection (I suppose) during the media center’s bootup. Maybe it died, and was never restarted. The irony is that systemd has a wonderful mechanism for restarting failing daemons, but ntpd is still under the backward-compatible LSB interface. So the system silently remained with no time synchronization.
Go the systemd way
systemd supplies its own lightweight time synchronization mechanism, systemd-timesyncd. It makes much more sense, as it doesn’t open NTP ports as a server (like ntpd does, one may wonder what for), but just synchronizes the computer it runs on to the remote NTP server. And judging from my previous experience with systemd, in the event of multiple solutions, go for the one systemd offers. In fact, it’s sort-of enabled by default:
$ systemctl status systemd-timesyncd
● systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/systemd-timesyncd.service.d
└─disable-with-time-daemon.conf
Active: inactive (dead)
Condition: start condition failed at Wed 2018-12-19 12:38:01 IST; 1 months 7 days ago
ConditionFileIsExecutable=!/usr/sbin/VBoxService was not met
Docs: man:systemd-timesyncd.service(8)
Start condition failed? What’s this? Let’s look at the drop-in file:
$ cat /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
[Unit]
# don't run timesyncd if we have another NTP daemon installed
ConditionFileIsExecutable=!/usr/sbin/ntpd
ConditionFileIsExecutable=!/usr/sbin/openntpd
ConditionFileIsExecutable=!/usr/sbin/chronyd
ConditionFileIsExecutable=!/usr/sbin/VBoxService
Oh please, you can’t be serious. Disabling the execution because of the existence of a file? If another NTP daemon is installed, does it mean it’s being enabled? In particular, if VBoxService is installed, does it mean we’re running as guests on a virtual machine? Like, seriously, someone might just install the Virtual Box client tools for no reason at all, and poof, there goes the time synchronization without any warning (note that this wasn’t the problem I had).
Moving to systemd-timesyncd
As mentioned earlier, systemd-timesyncd is enabled by default, but one may insist:
# systemctl enable systemd-timesyncd.service
(Nothing response, because it’s enabled anyhow)
However in order to make it work, remove the condition that prevents it from running:
# rm /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
and then disable and stop ntpd:
# systemctl disable ntp
# systemctl stop ntp
On my computer, the other two time synchronizing tools (openntpd and chrony) aren’t installed, so they are not to worry about.
And then we have timedatectl
Note directly related, and still worth mentioning
$ timedatectl
Local time: Sat 2019-01-26 21:22:57 IST
Universal time: Sat 2019-01-26 19:22:57 UTC
RTC time: Sat 2019-01-26 19:22:57
Time zone: Asia/Jerusalem (IST, +0200)
Network time on: yes
NTP synchronized: yes
RTC in local TZ: no
Systemd is here to take control of everything, obviously.