Double-Pass Spam Filtering with Gmail

From DreamHost
Jump to: navigation, search
WARNING

This no longer works due to a system change as of April, 2008.
New and existing email address may no longer be tied to a shell user if they are not already, though email may be forwarded to a shell account.
See Shell-linked E-mail for details.

The instructions provided in this article or section are considered advanced.

You are expected to be knowledgeable in the UNIX shell.
Support for these instructions is not available from DreamHost tech support.
Server changes may cause this to break. Be prepared to troubleshoot this yourself if this happens.
We seriously aren't kidding about this.

To use procmail-based instructions on new or migrated accounts, one needs to setup forwarding to a shell account, as discussed at Shell-linked E-mail.

The non-procmail instructions do work without using a shell.

Description

If SpamAssassin is letting too much spam through, you may want to consider using Gmail as an additional layer of defense. You could just configure DreamHost to forward all email to a Gmail account, but that could sacrifice some flexibility. These instructions will allow you to continue using DreamHost's email servers with no changes to your email client settings, but pass all incoming email through Gmail's spam filters before it gets to you. The idea and most of the implementation came from this blog entry.

NOTE: You can not do this with a mailbox user (those m###### accounts); you have to do it with a real user (with shell access).

WARNING: Gmail refuses some message types, including those with executable attachments such as .BAT, .EXE , even inside .ZIP files.

ANOTHER WARNING: Gmail will almost certainly score some false positives. i.e. some email that is not spam will be filtered. Depending on your volume of spam you may or may not want to check the spam box at gmail.

Procedure

The first step is to set up a Gmail account that won't be used for any other purpose—all email to this address will be forwarded to your DreamHost account (but see below for a way around this). Go to the Settings area in Gmail, and in the "Forwarding and POP/IMAP" tab, enter your DreamHost email address. You can also choose to save all incoming mail on the Gmail server, if you like—it's always good to have a backup, in case (for example) DreamHost's servers are ever down or inaccessible at a critical moment.

Next you'll need to enable procmail on your DreamHost account. Note that you'll need shell access for this to work, and that if you've set up any keyword filters in the DreamHost control panel, this will cause them to stop working. You can either use the shell command line to make the changes, or you can use the webftp GUI.

Note that procmail can be very picky about permissions. In particular, don't leave any procmail files or directories set to be group or world writable or procmail won't work properly.

To enable procmail, create a file named .forward.postfix in your home directory (note the dot at the beginning of the filename) containing just the following line (including the quotation marks):

"|/usr/bin/procmail -t"

Finally, create your .procmailrc file in your home directory (again, note the leading dot) to tell procmail what to do with incoming mail. Here's all you need:

DEFAULT=$HOME/Maildir/
MAILDIR=$HOME/Maildir
PMDIR=$HOME/.procmail
LOGFILE=$PMDIR/log
SHELL=/bin/sh

# forward to gmail account for spam filtering - cf http://mboffin.com/post.aspx?id=1636
:0
* !X-Forwarded-For: user@gmail.com user@domain.com
{
    :0fw
    | formail -IDelivered-To
    
    :0
    ! user@gmail.com
}

Be sure to replace those email addresses—all three of them—with your own domain (DreamHost) address and gmail address. Basically what that rule does is check for the presence of the X-Forwarded-For header; if it's found, procmail lets the email through; if not, procmail first pipes the headers through a program called formail to strip out the Delivered-To header (this helps avoid mail loops), then forwards the message to your Gmail account.

Now try sending yourself a test email. When it arrives, check the full headers or the raw source (how you do this depends on what email client you're using). You should see the following in there somewhere:

X-Forwarded-For: user@gmail.com user@domain.com

If that's there, the email was passed through Gmail, and Gmail decided your test message wasn't spam. If you're also using DreamHost's built-in SpamAssassin support, you'll also see something similar to this (the numbers may be different, depending on your individual settings):

X-Spam-Status: No, hits=0.0 tagged_above=-999.0 required=3.0 tests=

Troubleshooting Notes

  • Email identified as spam by Gmail will not be forwarded; it'll be sent directly to Gmail's Spam folder. For the first couple weeks or so, it's a good idea to log into Gmail and check that folder regularly for false positives. After a while Gmail will learn what sort of email you normally receive, and will get better and better at identifying spam.
  • If Gmail ever experiences a problem forwarding your email back to the DreamHost mail servers (for example, if DreamHost rejects the forward because of a potential infinite loop), Gmail will stop forwarding mail, but will still receive and hold mail for you. You can re-enable forwarding by going back to the Gmail Settings page, "Forwarding and POP/IMAP" tab, and clicking "Try again."
  • If Gmail identifies a message as spam or can't forward a message on the first try for any reason, it won't automatically try again. For example, if you find a false positive in the spam folder and mark it "not spam," the message is moved to the inbox but not forwarded back to your DreamHost account. You'll need to manually forward it if you want to see it in your regular inbox, outside of Gmail.
  • As noted above, Gmail always rejects messages with executable attachments.

Usage Tips

  • Instead of forwarding all email received at your Gmail address back to your DreamHost address, you can use Gmail filters to selectively forward mail. This makes it possible to use one Gmail address to filter spam for more than one DreamHost address, for example, or to filter using a Gmail address that you also use for other purposes. (Example follows)
  • One simple way to setup filters is to use the "+" feature that email addresses have. You can use "unlimited" email addresses that have a + and some extra text in the address. For example, if your email is "user@testdomain.com", you will also receive all email sent to "user+dhfilter@testdomain.com". In order to use this feature for spam filtering, change the procmail sample above to send mail to "user+dhfilter@gmail.com". Then add a gmail filter that automatically forwards all mail sent to "user+dhfilter@gmail.com" back to your regular email address. This is useful because you cannot setup gmail filters based on mail headers (at least I have not been able to make this work). This method will allow use to continue to use your Gmail account as you normally would.
  • You may want to disable the quarantine in your SpamAssassin settings to avoid having to check two places (DreamHost's quarantine and Gmail's Spam folder) for false positives. You can then create a filter in Gmail to send mail with ** DHSPAM ** in the subject line to the Spam folder. Note, however, that messages from email addresses and domains that you've blacklisted will still be sent to DreamHost's quarantine, regardless of any other settings.
  • After a bit of training, you might decide that Gmail does a good enough job on its own, without SpamAssassin's help. Just disable the Junk Mail filter in the DreamHost control panel and Gmail will continue to do its thing.


procmailrc example

/usr/share/doc/spamc/procmailrc.example

# SpamAssassin sample procmailrc
# ==============================    
# The following line is only used if you use a system-wide /etc/procmailrc.
# See procmailrc(5) for infos on what it exactly does, the short version:
#  * It ensures that the correct user is passed to spamd if spamc is used
#  * The folders the mail is filed to later on is owned by the user, not
#    root.
DROPPRIVS=yes
# Pipe the mail through spamassassin (replace 'spamassassin' with 'spamc'
# if you use the spamc/spamd combination)
#
# The condition line ensures that only messages smaller than 250 kB
# (250 * 1024 = 256000 bytes) are processed by SpamAssassin. Most spam
# isn't bigger than a few k and working with big messages can bring
# SpamAssassin to its knees.
#
# The lock file ensures that only 1 spamassassin invocation happens
# at 1 time, to keep the load down.
#
:0fw: spamassassin.lock
* < 256000
| spamassassin  
# Mails with a score of 15 or higher are almost certainly spam (with 0.05%
# false positives according to rules/STATISTICS.txt). Let's put them in a
# different mbox. (This one is optional.)
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
almost-certainly-spam  
# All mail tagged as spam (eg. with a score higher than the set threshold)
# is moved to "probably-spam".
:0:
* ^X-Spam-Status: Yes
probably-spam 
# Work around procmail bug: any output on stderr will cause the "F" in "From"
# to be dropped.  This will re-add it.
# NOTE: This is probably NOT needed in recent versions of procmail
:0
* ^^rom[ ]
{
  LOG="*** Dropped F off From_ header! Fixing up. "
 :0 fhw
 | sed -e '1s/^/F/'
}

non-procmail example

This is a way to get around the fact that you can't forward catch-all to a non-dreamhost email address. It is a simple way to use gmail filtering for people who don't want to deal with procmail.

Create a mail box something like mymail@yourdomain.com

Create an email address forwarder for gmail e.g. filterit@yourdomain -> youraccount@gmail.com

Then simply either forward your catch-all or any email addresses you use to filterit@yourdomain.com

In gmail, set up forwarding to forward all mail to mymail@yourdomain.com which you then collect...minus spam.

This technique has been hugely successful for me with gmail filtering over 1000 spam per day and perhaps 10 a week getting through. I am happy to report that gmail seems to have modified their spam filter and it is now down to 1 or maybe 2 a week out of almost 10,000 spam and maybe 250 real emails.