| Thursday, December 9, 2004 |
|---|
POSIX locking
I just realized that the silly locking I described in my previous post on spam filtering doesn't really work because it is being ignored by most programs writing to my mailboxes. Here's the correct version of the rotmail script:
#!/bin/sh
# $Id: rotmail,v 1.3 2004/12/09 14:53:33 bdolicki Exp bdolicki $
# by bdolicki@branimir.com
# I execute this script on a regular basis to:
# 1/ Report manually classified spam to Vipul's razor and feed it into
# Bayesian learner
# 2/ Feed automatically classified spam into Bayesian learner
# 3/ Feed the ham from the inbox into Bayesian learner
# 4/ Archive and truncate the above three folders
# I assume that there are three different kinds of mail (see below):
# 1/ Legit mail (my inbox)
# 2/ Manually classified spam
# 3/ Automatically classified spam
##
# Some variables which can be customized. You definitely *want* to
# take a look a these!
##
# This is a program that does something like:
# cat file1 >> file 2; cat /dev/null > file1
# The reason I'm not doing it that way is that we need to do proper
# locking. In my case that means doing fcntl(2) (POSIX) locking.
ARCHIVING_PROGRAM=$HOME/bin/append-and-truncate.rb
# This is the directory where I keep mails that I actually care for and
# want the IMAP to access. It contains various mailing list folders,
# inbox-archive as well as folder with manually clasified mail so that I
# can do it with my IMAP client.
MAILDIR=$HOME/Mail
# This is the directory which I don't want to access with IMAP
SPAMDIR=$HOME/Spam
# We have six mail folders alltogether: Three folders mentioned above
# and an archive folder for each of those. Following variables control
# names of those folders.
# the inbox (usually in /var/spool/mail)
INBOX=/var/spool/mail/$USER
# manually classified spam: spam which managed to pass through the
# filter (a rare beast these days):
MANUAL_SPAM=$MAILDIR/sp
# automatically classified spam. In my case this contains all mails
# with SpamAssassin score > 5.
AUTOMATIC_SPAM=$SPAMDIR/spam
# Following three folders have suffix -archive. I append the contents
# of the above three folders to them.
INBOX_ARCHIVE=$MAILDIR/inbox-archive
MANUAL_SPAM_ARCHIVE=$SPAMDIR/sp-archive
AUTOMATIC_SPAM_ARCHIVE=$SPAMDIR/spam-archive
# This is where your GNU readlink utility (part of GNU Coreutils) sits:
PATH=$PATH:/opt/gnu/bin
##
# ----- You normally don't need to customize anything below this line -------
##
# Given two files this function will append the contents of the first
# file to the second file and truncate the first file. It uses the
# locking mechanism which is also used by sendmail, imapd ad mutt It
# will not do the truncation if appending failed.
# A small utility function
fileordie() {
if [ ! -e $1 ]
then
echo "$1: file not found. Exiting." 1>&2
exit 1
fi
}
# This is the beginning of the actual program
# If something is wrong with just one file we need -- exit. Just in case.
for FOLDERNAME in $INBOX $INBOX_ARCHIVE $MANUAL_SPAM $MANUAL_SPAM_ARCHIVE \
$AUTOMATIC_SPAM $AUTOMATIC_SPAM_ARCHIVE $ARCHIVING_PROGRAM
do
fileordie $FOLDERNAME
done
# Report manually classified spam. This will also invoke sa-learn
echo "reporting manual spams"
cat $MAILDIR/sp | formail -s spamassassin --report
$ARCHIVING_PROGRAM $MAILDIR/sp $SPAMDIR/sp-archive
echo "learning automatic spams with sa-learn"
sa-learn --spam --mbox --showdots $SPAMDIR/spam || exit 1
$ARCHIVING_PROGRAM $SPAMDIR/spam $SPAMDIR/spam-archive
echo "learning ham in $INBOX with sa-learn"
sa-learn --ham --mbox --showdots $INBOX || exit 1
$ARCHIVING_PROGRAM $INBOX $MAILDIR/inbox-archive
The key part is this little Ruby program: append-and-truncate.rb :
#!/usr/bin/ruby
# $Id: append-and-truncate.rb,v 1.1 2004/12/09 15:36:11 bdolicki Exp bdolicki $
# USAGE: append-and-truncate file1 file2
# An equivalent of:
# cat file1 >> file 2; cat /dev/null > file1
# but doing proper fcntl(2) locking.
#
# This is good because all programs I care about such as procmail, mutt
# and dovecot IMAP server use this kind of locking.
# I'll move this part into a library file eventually.
# fcntl() support is built into the File class. We need this module just
# to translate symbolic constants to their system-dependent numerical
# values:
require 'fcntl'
# Let's enrich the File class with a few more methods :-)
class File
def fcntl_read_lock
do_fcntl_lock Fcntl::F_RDLCK
end
def fcntl_write_lock
do_fcntl_lock Fcntl::F_WRLCK
end
def fcntl_unlock
do_fcntl_lock Fcntl::F_UNLCK
end
def do_fcntl_lock (type)
lock_info = [type, 0, 0, 0, 0].pack("ssqqi")
fcntl Fcntl::F_SETLKW, lock_info
end
end
# The actual program starts here.
if ARGV.length != 2
then
raise "I need exactly two files"
end
inbox = ARGV[0]
outbox = ARGV[1]
puts "Moving the contents of #{inbox} to #{outbox} ..."
File.open(inbox, "r+") do |infile|
File.open(outbox, "a") do |outfile|
outfile.fcntl_write_lock
infile.fcntl_write_lock
while line = infile.gets
outfile.puts line
end
infile.truncate(0)
infile.fcntl_unlock
outfile.fcntl_unlock
end
end
puts "... done"
|
Posted by Branimir Dolicki at 16:38 |
|
# - G - Add comment |
You may request notification for Branimir's Blog.