| Sunday, April 25, 2004 |
|---|
Enterprise Spam Filtering
Update: 9 December 2004: The program below is obsolete. Use this one.As I wrote before I don't have problems with spam anymore. I described how I'm using several mail filtering techniques to stop almost all the spam while not having to worry for legitimate mail to get lost.
Here's the last version of the very simple script that I execute once in a while to process both spam and "ham". I tidied it up so people can use it.
#!/bin/sh
# $Id: rotmail,v 1.2 2004/04/25 17:57:26 bdolicki Exp bdolicki $
# by bdolicki@branimir.com
# I execute this script on a regular basis to:
# 1/ Report manually classified spam to Vipul's razor and feed it into
# Bayesian learner
# 2/ Feed automatically classified spam into Bayesian learner
# 3/ Feed the ham from the inbox into Bayesian learner
# 4/ Archive and truncate the above three folders
# I assume that there are three different kinds of mail (see below):
# 1/ Legit mail (my inbox)
# 2/ Manually classified spam
# 3/ Automatically classified spam
##
# Some variables which can be customized
##
# This is the directory where I keep mails that I actually care for and
# want the IMAP to access. It contains various mailing list folders,
# inbox-archive as well as folder with manually clasified mail so that I
# can do it with my IMAP client.
MAILDIR=$HOME/Mail
# This is the directory which I don't want to access with IMAP
SPAMDIR=$HOME/Spam
# We have six mail folders alltogether: Three folders mentioned above
# and an archive folder for each of those. Following variables control
# names of those folders.
# the inbox (usually in /var/spool/mail)
INBOX=$MAIL
# manually classified spam: spam which managed to pass through the
# filter (a rare beast these days):
MANUAL_SPAM=$MAILDIR/sp
# automatically classified spam. In my case this contains all mails
# with SpamAssassin score > 5.
AUTOMATIC_SPAM=$SPAMDIR/spam
# Following three folders have suffix -archive. I append the contents
# of the above three folders to them.
INBOX_ARCHIVE=$MAILDIR/inbox-archive
MANUAL_SPAM_ARCHIVE=$SPAMDIR/sp-archive
AUTOMATIC_SPAM_ARCHIVE=$SPAMDIR/spam-archive
##
# ----- You normally don't need to customize anything below this line -------
##
# Given two files this function will append the contents of the first
# file to the second file and truncate the first file. It uses the
# locking mechanism which is also used by sendmail, imapd ad mutt It
# will not do the truncation if appending failed.
archive() {
SOURCE=`readlink -f $1`
TARGET=`readlink -f $2`
echo -n "$0: trying to lock file $TARGET..."
if lockfile -r10 -l3600 $TARGET.lock
then
echo "...success"
trap "rm -f $TARGET.lock" 1 2 3 13 15
echo -n "$0: trying to lock file $SOURCE..."
if lockfile -l3600 $SOURCE.lock
then
echo "...success"
trap "rm -f $SOURCE.lock" 1 2 3 13 15
echo -n "$0: appending $SOURCE to archive folder $TARGET"
echo -n " and truncating $SOURCE..."
cat $SOURCE >> $TARGET && cat /dev/null > $SOURCE
echo "...done"
rm -f $SOURCE.lock
else
echo "$0: couldn't lock the file $SOURCE. Exiting." 1>&2
exit 1
fi
rm -f $TARGET.lock
else
echo "$0: couldn't lock the file $TARGET. Exiting." 1>&2
exit 1
fi
}
# A small utility function
fileordie() {
if [ ! -e $1 ]
then
echo "$1: file not found. Exiting." 1>&2
exit 1
fi
}
# This is the beginning of the actual program
# If something is wrong with just one folder we exit. Just in case.
for FOLDERNAME in $INBOX $INBOX_ARCHIVE $MANUAL_SPAM $MANUAL_SPAM_ARCHIVE \
$AUTOMATIC_SPAM $AUTOMATIC_SPAM_ARCHIVE
do
fileordie $FOLDERNAME
done
# Report manually classified spam. This will also invoke sa-learn
echo "reporting manual spams"
cat $MAILDIR/sp | formail -s spamassassin --report || exit 1
archive $MAILDIR/sp $SPAMDIR/sp-archive
echo "learning automatic spams with sa-learn"
sa-learn --spam --mbox --showdots $SPAMDIR/spam || exit 1
archive $SPAMDIR/spam $SPAMDIR/spam-archive
echo "learning ham with sa-learn"
sa-learn --ham --mbox --showdots $MAIL || exit 1
archive $MAIL $MAILDIR/inbox-archive
|
Posted by Branimir Dolicki at 20:01 |
|
# - G - Add comment |
You may request notification for Branimir's Blog.