SpamAssassin

From DreamHost
Jump to: navigation, search
WARNING

This no longer works due to a system change as of April, 2008.
New and existing email address may no longer be tied to a shell user if they are not already, though email may be forwarded to a shell account.
See Shell-linked E-mail for details.

See also Procmail + SpamAssassin.

The instructions provided in this article or section are considered advanced.

You are expected to be knowledgeable in the UNIX shell.
Support for these instructions is not available from DreamHost tech support.
Server changes may cause this to break. Be prepared to troubleshoot this yourself if this happens.
We seriously aren't kidding about this.

Introduction

SpamAssassin is one of the best spam filtering tools available, freely under the Apache license.

There are basically three different ways to take advantage of SpamAssassin within DreamHost's environment (Listed in order from easiest to advanced)

  1. Use DreamHost's Junk Mail system
    • Easy setup via the DreamHost panel
    • Maintained by DreamHost
    • Spam can be reviewed only via webmail, or in IMAP client with IMAP filtering option
    • Bayes rules (but no trainable Bayes db)
    • Delivery delays have been noted
  2. Run DreamHost's version of SpamAssassin on your assigned host
    • High level of control over spam scoring and routing
    • Straightforward setup - some knowledge of shell commands is necessary
    • DreamHost may (or may not) update the version of SpamAssassin on your server
    • You can lose mail if you configure this wrong
  3. Install a newer or customized version of SpamAssassin on your assigned host
    • Even higher level of control
    • Run the latest version of SpamAssassin
    • Setup is straightforward, but more involved
    • DreamHost won't update - If your version has a security issue, you must patch/update it yourself.
    • You can lose mail if you configure this wrong


The version installed on DreamHost's mail servers usually lags behind the current version (3.1.7 as of October 25, 2006, versus DreamHost's v3.03 or so). Therefore, you have two options: either use the version provided by Dreamhost, or install the latest version into your account.

NOTE: Custom versions are only recommended to be installed on a Private Server with courier enabled, so you can check the filtered mail on the command line. Forwarding mail back for IMAP checking is not supported either on shared or PS.

Some portions of the following instructions may be prepared on a local PC and transferred to the server with SFTP, but much must be done in the Shell, using Shell Commands with SSH.

DreamHost's Junk Mail system

Basic SPAM filtering can be set up via the control panel. Instructions can be found here: Junk_Mail#Enabling_Junk_Filter

Using SpamAssassin

By default, all your mail is delivered by a program called Postfix. To use Spam Assassin, you must first tell postfix to send mail to procmail (a mail processing program), and then configure procmail to use Spam Assassin. Sounds confusing? It's not.

~/.forward.postfix

Create a file named .forward.postfix in your home directory (/home/username/) which contains the following line (The quotes are important!):

"|/usr/bin/procmail -t"

~/.procmailrc

Create a file named .procmailrc in your home directory and type the following in it:

# PROCMAIL ENVIRONMENT
# Directory for storing procmail files
PMDIR=$HOME/.procmail

# Procmail log file
LOGFILE=$PMDIR/log

# Shell to use for recipes
SHELL=/bin/sh

# MAIL PROCESSING RECIPES
# Pipe any messges under 512 K through spamassassin for scoring
:0fw:spamassassin-lock
* < 524288
| spamassassin -P

Note: The option -P is not needed on Spam Assassin version 3 or greater. It 'is' needed for version 2. Some of the shell systems are running SpamAssassin 3.0.3, while others are not (See Determine Your SpamAssassin Version).

Automatic Sorting

If you want Procmail to automatically put messages that SpamAssassin identifies as spam in a specific folder (eg. a folder named "Spam"), add the following lines to the end of your .procmailrc file:

# Dump spam messages in the spam folder
:0
* ^X-Spam-Status: Yes
$HOME/Maildir/.Spam/

This identifies the mail header X-Spam-Status that SpamAssassin adds to spam messages, and then causes that mail to be placed in the .Spam/ folder. Examining some mail I receive through a university account, I also find that the X-PMX-Spam header can hold useful information:

# Dump more spam messages in the spam folder
:0
* ^X-PMX-Spam: Gauge=XXXXXX
$HOME/Maildir/.Spam/

To test that your rules are working, try setting a test subject, then sending yourself mail with that subject from another account. It should be delivered to your Spam folder:

# Filtering test
:0
* ^Subject: SPAMTEST
$HOME/Maildir/.Spam/

Shorcuts

You may find it tedious to update the Spam folder location in the above recipes. To avoid this, add

SPAMDIR=$HOME/Maildir/.Spam/

to the top half of the file, then filter your mail to $SPAMDIR:

:0
* ^X-Spam-Status: Yes
$SPAMDIR

Using these recipes, I don't have to use rewrite_header in my user_prefs file, or use any filters in my mail client.

~/.spamassassin/user_prefs

Spamassassin's default settings are pretty good, but there are a few changes that you can make which will increase your spam capture rate considerably. Global settings are stored in /etc/spamassassin/local.cf. As of this writing on my server, local.cf is basically empty (comments only). Your personal preferences are stored in ~/.spamassassin/user_prefs. If you have not had any mail delivered, you may have to create this folder and file.

System documentation on SpamAssassin will help with editing the rules. This documentation is available with the commands:

$ man spamassassin
$ man Mail::SpamAssassin::Conf

trusted_networks

By default, Spamassassin is a bit too trusting and will score down e-mails with the "ALL_TRUSTED" test; the result is that more spam gets through. You can use the trusted_networks configuration setting to tell Spam Assassin which networks are to be trusted. In a terminal, type

$ dig spf.dreamhosters.com txt

to get the latest list of trusted DreamHost mail servers, then add the following lines (or similar) to your user_prefs file:

trusted_networks  66.33.192.0/19 66.201.54.64/26 205.196.208.0/20
trusted_networks  64.111.96.0/19 208.97.128.0/18 208.113.128.0/19

perhaps via e.g.,

$ dig spf.dreamhosters.com txt|perl -lnwe 'push @m,(/ip4:(\S+)/g);END{print "trusted_networks @m"}'
trusted_networks 66.33.192.0/19 66.201.54.64/26 205.196.208.0/20 64.111.96.0/19 208.97.128.0/18 208.113.128.0/17 67.205.0.0/19

rewrite_header

If you use filters in your mail client (eg. Thunderbird or Evolution) to sort spam, it may be helpful to prepend "[SPAM]" to the subject of add. Add the following line (or similar) to your user_prefs file:

rewrite_header subject [SPAM]

... or use any other text you'd like.

Bayes Training

Bayesian filtering allows SpamAssassin to learn how to recognize spam messages. To do so, it has to be 'trained' with some messages that are known to be spam and ham (not spam).

If you want to use the Bayes tests:

  1. you must be running SpamAssassin 3.0 or newer (See Determine Your SpamAssassin Version).
  2. you can only train using full SHELL USER accounts (the m######## accounts exist only on DreamHost's mail server, which you do not have access to from the shell)
  3. you must first train the database with at least 200 spam email messages AND 200 ham email messages (non-spam messages).

Basic Training

Sort some existing e-mail into spam and non-spam folders. Let's assume you have a sub-folder of your INBOX called "Spam", another subfolder called "Ham", and some other folders (with no spam in them). Find out where the utility sa-learn is. If you are using DreamHost's version of SpamAssassin, this will probably be something like /usr/bin/sa-learn. If you installed your own version, it may be elsewhere. In a terminal, type:

$ which sa-learn

Train using a folder full of spam. In a terminal, type (replace /usr/bin/sa-learn with the path you determined above. Capitals are important!):

$ /usr/bin/sa-learn --no-sync --spam ~/Maildir/.Spam/cur

Train using a single ham folder:

$ /usr/bin/sa-learn --no-sync --ham ~/Maildir/.Ham/cur

Train using many ham folders. Typically, all your folders, with the exception of .Spam, are containers for non-spam email. To train using these as ham:

$ /usr/bin/sa-learn --no-sync --ham `find ~/Maildir -name cur|grep -v .Spam`

Synchronize (save) the learned rules:

$ /usr/bin/sa-learn --sync

You can also view the rules that SpamAssassin has learned by typing

$ /usr/bin/sa-learn --dump magic

Automated Training Using Cron

Create a file (I use ~/.spamassassin/learn-spam.sh) using some of the above rules:

#!/bin/bash
# Automated Bayesian Training

# Train ham (ignore outgoing mail & deleted mail that may be spam or ham)
# Do each directory in turn so that procwatch doesn't kill sa-learn
#
find ~/Maildir -name cur | egrep -v '(.Spam)|(.Trash)|(.Sent)' |while read i; do
 /usr/bin/sa-learn --no-sync --ham "$i"
done

# Train spam
/usr/bin/sa-learn --no-sync --spam ~/Maildir/.Spam/cur

# Save
/usr/bin/sa-learn --sync

# Delete learned spam
mv ~/Maildir/.Spam/cur/* ~/Maildir/.Trash/cur
Make the script executable:
 $ chmod 700 ~/.spamassassin/learn-spam.sh

Add a job to your Crontab that runs ~/.spamassassin/learn-spam.sh daily or less. More frequently may be problematic with large mailboxes.

Using the DBS Block List (trusted_domains) tests and the Bayes tests, I've found SpamAssassin to be incredibly accurate at detecting both spam and ham.

Country Training

SpamAssassin comes with a plugin to add information to the headers of messages about which country the messages were relayed through. If this plugin is activated and enough spam messages are relayed through certain countries, the Bayes feature will begin to detect this as spam.

To enable the plugin:

  1. Use CPAN to install IP::Country::Fast as above.
  2. Uncomment the 'loadplugin' line in the ~/saetc/mail/spamassassin/init.pre file for Mail::SpamAssassin::Plugin::RelayCountry.

That's it. Now you can examine the headers of your new mail messages to see if the relay countries are added to the headers. See http://wiki.apache.org/spamassassin/RelayCountryPlugin for more information.

Installing v3.1.0 into your account

There is a good guide available, but it doesn't include instructions for installing the Perl CPAN modules necessary for the DNS-based tests to succeed (this is necessary because the DreamHost version of the needed CPAN modules are too old). The instructions here for installing CPAN modules don't seem sufficent. These instructions were quite helpful.

After following the guide, you will also need to follow these steps for the DNS-based tests to succeed:

  1. SSH in to your account. (Use the same account you installed SpamAssassin to, the one where your maildir is.)
  2. Run cpan, then type exit. That should create ~/.cpan/CPAN/MyConfig.pm.
  3. Open ~/.cpan/CPAN/MyConfig.pm in your text editor (I like nano). Find the makepl_arg variable, and add PREFIX=/home/your_username to it, with a space separating any other arguments. Save and quit. The line should look like: 'makepl_arg' => q[PREFIX=/home/username INSTALLEDIRS=site],
  4. Run export PERL5LIB=$HOME/lib/perl/5.8.4/:$HOME/share/perl/5.8.4/ so that cpan can find modules that it installs (when checking dependencies)
  5. Run cpan, and then install the following modules by typing install modulename (These three modules may want to install additional modules to support these; in my case I answered yes to everything.):
    • Net::IP
    • Net::DNS
    • Mail::DomainKeys
    • Crypt::OpenSSL::Random
    • Crypt::OpenSSL::RSA
  6. exit CPAN when you've finished installing all of the modules.
  7. ln -s ~/lib/perl/5.8.4/Net/* ~/share/perl/5.8.4/Net

Now send yourself some test e-mails, preferably spams you still have. Once received, look at the headers to check their scores and see which tests were run. On most spams, you should see DNS-based tests like URIBL, SpamCop, etc. being run, and giving your spams quite high scores.

If things don't go quite right, or even if they do, you may want to look at ~/.procmail/log and see if SpamAssassin is reporting any problems. I believe these instructions are correct, but I could have left out a few things, so you may need to install some additional modules that I forgot.

You can incorporate ClamAV into your SpamAssassin installation with Clamassassin (A guide for installing ClamAV and Clamassassin on DreamHost).

adding SA for all the domain's mail accounts

In the forum, user stoneyb writes: "I have SA set up for a whole domain (twice). Install it into the domain home, the one with the web sites, using the personal install instructions, and have the users in the domain use the full path to the domain's SA in their procmailrc files. This works quite well for me."

To be more specific:

Create the ~/procmail/spam.rc file, as instructed in the guide referenced above, but when you reach this line:

| $HOME/sausr/bin/spamassassin

Remove the variable $HOME, substituting the full path to the domain you installed it to, ie:

| /home/myaccountwithcustomsa/sausr/bin/spamassassin

You may want to actually try running "/home/myaccountwithcustomsa/sausr/bin/spamassassin -V" from the account whose spam.rc file you're modifying, just to ensure you have the appropriate permissions, but it should work as long as both accounts are under the Dreamhost account.

Notes

Determine Your SpamAssassin Version

In a terminal, type

$ spamassassin -V