How we monitor delivery

We have developed some instrumentation so we can monitor how we are doing on getting messages to our users’ inbox. Our applications tag each outgoing message with a unique header with a hashed value that gets recorded by the application before the message is sent.

To gather delivery information, we run a script that tails the Postfix logs and extracts the delivery time and status for each piece of mail, including any error message received from the receiving mail server, and links it back to the hash the application stored. We store this information for 30 days.

We also send these statistics to our stats server so they can be reported through our metrics dashboard. This “live” and historical information can then be used by our operations team to check how we’re doing on aggregate mail delivery for each application.

Why run your own mail servers?

Over the last few years, at least a dozen services that specialize in sending email have popped up, ranging from the bare-bones to the full-service. Despite all these “email as a service” startups we’ve kept our mail delivery in-house, for a couple of reasons:

  • We don’t know anyone who could do it better. With a 99.3% delivery rate, we haven’t found a third party provider that actually does better in a way they’re willing to guarantee.
  • Setup hassle Most of the third party services require that you verify each address that sends email by clicking a link that gets sent to that address. We send email from thousands and thousands of email addresses for our products, and the hassle of automatically registering and confirming them is significant. Automating the process still introduces unnecessary delivery delays.

Given all this, why should we pay someone tens of thousands of dollars to do it? We shouldn’t, and we don’t.

Read more about how we keep delivery rates high after the jump…

How we keep our mail delivery rates up

Lets be honest from the get-go. Mail delivery is more of an art than a science. We’ve found that even when you “play by the rules”, there’s still times when a major provider will reject all your mail without notice. Usually it takes a couple emails to to the providers abuse address, and things get resolved. In spite of these “out of our control” issues, we’ve found a few things help us keep delivery rates up:

  1. Constantly monitor spam blacklists: We have a set of Nagios alerts that regularly check if we’re listed on any delivery blacklists, and whenever they go off we take whatever corrective action we need to get back off the blacklist.
  2. Have valid SPF records. Don’t impersonate your users. When running a web app like Basecamp, which sends email that are generated by another user, it can be tempting to send the email from that user (e.g., so that a comment I wrote on Basecamp would appear to come from noah at 37signals dot com), which might make people feel more comfortable. Unfortunately, this is a surefire way to end up on spam lists, since you’ll likely be sending from an IP address that does not have the valid SPF records. And chances are, if the user’s domain does have an SPF record, it doesn’t include your application’s IP.
  3. Sign the mail! DKIM and Domain Keys. Yahoo and Gmail both score signed email higher.
  4. Dedicated and conditioned email sending IPs.
  5. Configure reverse dns entries. Most of the “big boys” won’t accept mail from your servers if your reverse dns entries don’t match. You might need your IP provider to help with setting up these records.
  6. Enroll in feedback loops. We haven’t automated our parsing of feedback, but a daily / weekly review of feedback loop emails helps us know when there’s an unhappy user, or other problem. Too many complaints and you’ve got trouble.

A problem we haven’t solved

By far the biggest cause of failed email delivery we see is due to bad email addresses that were entered in to the system—problems like ‘joe@gmal.com’ or ‘sue@yahooo.com’. By and large, these pass a regular expression check for email addresses, but aren’t actually valid addresses. There’s no perfect solution here, but we’ve been experimenting with checking for valid DNS records or actually attempting to connect to the mail server as part of the validation of an email address, and with notifying people within the application when we aren’t able to deliver mail to them.

A few tools

  • MX Toolbox is a great site for doing a quick check on your mail servers and your customer’s mail servers.
  • Sender Score is really a marketing tool for Return Path, but it can be used to get insight about how some of the “big boys” are scoring your sending IPs.
  • Postmark offers a web tool and API to get the SpamAssassin score for a message, which can be helpful for identifying things you can improve to boost delivery rates.

Have questions about email delivery? Ask in the comments, and we’ll try our best to answer.