SpamBayes
From DreamHost
| This article or section may require a cleanup. We are hoping to create articles that meet certain standards. Please discuss this issue on the talk page. Editing help is available. |
SpamBayes is a statistical (commonly, although a little inaccurately, referred to as Bayesian) anti-spam filter. When properly configured and adequately trained, it can filter nearly 100% of spam from most users' mailboxes. It can be set up on a DreamHost account as a flexible and more accurate alternative to DreamHost's own junk mail filter.
Contents |
Installing Spambayes
| The instructions provided in this article or section are considered advanced. You are expected to be knowledgeable in the UNIX shell. |
| The instructions provided in this article or section require shell access unless otherwise stated. You can use the PuTTY client on Windows, or SSH on UNIX and UNIX-like systems such as Linux or Mac OS X. |
These directions are/will be an adapted, updated, and hopefully clearer version of instructions from carlo.zottmann.org.
Installing SpamBayes to your account in this manner will only be helpful if you access your mail through IMAP or through a webmail client. POP3 users will not benefit.
These directions are a work in progress.
Step 1: User Preparation
The installation of SpamBayes will require that your mail user also be a SSH user. This means that if you are creating a new mail user and mailbox, you may simply create a new shell user in the Control Panel and, during the second step, set up a mailbox.
For existing mail-only users, however, the process is more complicated:
Note: the author has only done the following procedure once, and while he believes he did it correctly, there may be a lower-risk method of performing this migration.
Read the process through before attempting it and make sure you have a clear idea of what you should be doing in each step. If any step doesn't make sense to you, don't try this.
- Contact DreamHost support and get a backup archive of the mail user's mailbox.
Once you get the backup, the following steps should be performed as quickly as possible, because you will restore this backup to another user and will therefore lose any mail you receive between the time the backup is made and the time the new account is established. - Create a new shell user. The username does not have to be the same as the email address you want (for example, the shell username for chris@mydomain.net is actually chris_main_email).
- During the second step of the user creation process, choose not to create a mailbox.
- In the Mail > Manage Email area of the Control Panel, click on the Edit button for the mailbox you wish to migrate. In that screen, select shell user you just created from the "Mailbox Login" selection list. Click the "Save Changes" button.
- Now, extract the backup archive of the mailbox over the "Maildir" folder in the new shell user's home directory.
- You should be able to login to the mailbox using the email address and the mail password.
Step 2: Installing Python
This step is no longer required, as the version of Python installed on DreamHost's servers is now recent enough that it should be able to run SpamBayes with no problem. However, if you'd like to run your own installation of Python, directions may be found here. Note that some minor changes will need to be made for the latest version of Python.
Step 3: Installing SpamBayes
- Get a SpamBayes package from the site and extract it to a convenient directory within your user's home directory. If necessary, remove the other applications from the SpamBayes package - we will only be using the sb_filter.py method, not the POP3 proxy, Outlook plugin, etc.
- In the SpamBayes folder, enter
python setup.py install</tt>. - Create the SpamBayes database: <code>PATH/TO/sb_filter.py -d $HOME/.hammie.db -n
- Train it on your existing mail. This is optional, but a good idea. -g is the flag for the known good mail, and -s is for known spam.
PATH/TO/sb_mboxtrain.py -d $HOME/.hammie.db -g $HOME/Maildir/cur -s $HOME/Maildir/.Junk
Note: Maildir/cur is your Inbox; Maildir/.Junk is your existing Spam folder (replace "Junk" with the name of your spam filter).
Step 4: Setting up Folders and Configuring Procmail
- Create a folder called Unsure under your Inbox. If you don't already have a Junk folder, create one.
- In your favorite text editor, open the .procmailrc file in your home directory. Update it with the following rules:
PMDIR=$HOME/.procmail
VERBOSE=yes
LOGFILE=$PMDIR/log
LOGABSTRACT=all
MAILDIR=$HOME/Maildir
:0fw:hamlock
| python $HOME/PATH/TO/sb_filter.py -d $HOME/.hammie.db
:0
* ^X-SpamBayes-Classification: spam
$HOME/Maildir/.Junk/
:0
* ^X-SpamBayes-Classification: unsure
$HOME/Maildir/.Unsure/
:0
$HOME/Maildir/
Additional procmail rules can be added to this file as well.
Step 5: Setting Up Training Cronjob
In the terminal, enter crontab -e.
In the crontab, add:
0 0 * * * python PATH/TO/sb_mboxtrain.py -d $HOME/.hammie.db -g $HOME/Maildir/cur -s $HOME/Maildir/.Junk
This will train SpamBayes every night at midnight (Pacific time) with the junk mail from your spam folder and the good ("ham") mail in your inbox.
Using This SpamBayes Setup
Mail that SpamBayes thinks is spam will be placed in your junk folder. Mail the SpamBayes classifies as good mail ("ham") will be left in your inbox. If SpamBayes isn't sure about an email, it will be placed in the Unsure folder.
You need to check and make sure that good mail isn't actually ending up in Spam and vice versa, and then check teh Unsure folder and classify anything in there. The cronjob will train SpamBayes every night, so all you need to do is sort mail into your Inbox and Junk folders.
Also, the procmail log will be located in the file $HOME/.procmail/log. This can be useful when debugging this setup.
See Also
- SpamBayes on UNIX or Linux
- Additional procmail info
- SpamBayes home page
- Howto from carlo.zottmann.org - has more complicated folder/procmail setup
- Installing Python - custom Python on a DreamHost account
- Another SpamBayes/procmail setup
- Procmail - info about Procmail on the DreamHost wiki

