Branimir's Blog Archive

branimir.com : Branimir's Blog

Thursday, December 9, 2004

POSIX locking

I just realized that the silly locking I described in my previous post on spam filtering doesn't really work because it is being ignored by most programs writing to my mailboxes. Here's the correct version of the rotmail script:
#!/bin/sh

# $Id: rotmail,v 1.3 2004/12/09 14:53:33 bdolicki Exp bdolicki $
# by bdolicki@branimir.com

# I execute this script on a regular basis to:
# 1/ Report manually classified spam to Vipul's razor and feed it into
#    Bayesian learner
# 2/ Feed automatically classified spam into Bayesian learner
# 3/ Feed the ham from the inbox into Bayesian learner
# 4/ Archive and truncate the above three folders

# I assume that there are three different kinds of mail (see below):
# 1/ Legit mail (my inbox)
# 2/ Manually classified spam
# 3/ Automatically classified spam

##
# Some variables which can be customized.   You definitely *want* to
# take a look a these!
##

# This is a program that does something like:
#   cat file1 >> file 2; cat /dev/null > file1
# The reason I'm not doing it that way is that we need to do proper
# locking. In my case that means doing fcntl(2) (POSIX) locking.
ARCHIVING_PROGRAM=$HOME/bin/append-and-truncate.rb

# This is the directory where I keep mails that I actually care for and
# want the IMAP to access.  It contains various mailing list folders,
# inbox-archive as well as folder with manually clasified mail so that I
# can do it with my IMAP client.
MAILDIR=$HOME/Mail

# This is the directory which I don't want to access with IMAP
SPAMDIR=$HOME/Spam

# We have six mail folders alltogether: Three folders mentioned above
# and an archive folder for each of those.  Following variables control
# names of those folders.

# the inbox (usually in /var/spool/mail)
INBOX=/var/spool/mail/$USER

# manually classified spam: spam which managed to pass through the
# filter (a rare beast these days):
MANUAL_SPAM=$MAILDIR/sp

# automatically classified spam.  In my case this contains all mails
# with SpamAssassin score > 5.
AUTOMATIC_SPAM=$SPAMDIR/spam

# Following three folders have suffix -archive.  I append the contents
# of the above three folders to them.

INBOX_ARCHIVE=$MAILDIR/inbox-archive
MANUAL_SPAM_ARCHIVE=$SPAMDIR/sp-archive
AUTOMATIC_SPAM_ARCHIVE=$SPAMDIR/spam-archive

# This is where your GNU readlink utility (part of GNU Coreutils) sits:

PATH=$PATH:/opt/gnu/bin

##
# ----- You normally don't need to customize anything below this line -------
##

# Given two files this function will append the contents of the first
# file to the second file and truncate the first file.  It uses the
# locking mechanism which is also used by sendmail, imapd ad mutt It
# will not do the truncation if appending failed.

# A small utility function
fileordie()  {
if [ ! -e $1 ]
then
    echo "$1: file not found. Exiting." 1>&2
    exit 1
fi
}

# This is the beginning of the actual program

# If something is wrong with just one file we need -- exit.  Just in case.
for FOLDERNAME in $INBOX $INBOX_ARCHIVE $MANUAL_SPAM $MANUAL_SPAM_ARCHIVE \
    $AUTOMATIC_SPAM $AUTOMATIC_SPAM_ARCHIVE $ARCHIVING_PROGRAM
do
    fileordie $FOLDERNAME
done

# Report manually classified spam.  This will also invoke sa-learn
echo "reporting manual spams"
cat $MAILDIR/sp | formail -s spamassassin --report
$ARCHIVING_PROGRAM $MAILDIR/sp $SPAMDIR/sp-archive

echo "learning automatic spams with sa-learn"
sa-learn --spam --mbox --showdots $SPAMDIR/spam || exit 1
$ARCHIVING_PROGRAM $SPAMDIR/spam $SPAMDIR/spam-archive

echo "learning ham in $INBOX with sa-learn"
sa-learn --ham --mbox --showdots $INBOX || exit 1
$ARCHIVING_PROGRAM $INBOX $MAILDIR/inbox-archive
The key part is this little Ruby program: append-and-truncate.rb :
#!/usr/bin/ruby

# $Id: append-and-truncate.rb,v 1.1 2004/12/09 15:36:11 bdolicki Exp bdolicki $
# USAGE: append-and-truncate file1 file2

# An equivalent of:
#  cat file1 >> file 2; cat /dev/null > file1
# but doing proper fcntl(2) locking.
#
# This is good because all programs I care about such as procmail, mutt
# and dovecot IMAP server use this kind of locking.

# I'll move this part into a library file eventually.

# fcntl() support is built into the File class. We need this module just
# to translate symbolic constants to their system-dependent numerical
# values:

require 'fcntl'

# Let's enrich the File class with a few more methods :-)
class File
    def fcntl_read_lock
      do_fcntl_lock Fcntl::F_RDLCK
    end
  
    def fcntl_write_lock
      do_fcntl_lock Fcntl::F_WRLCK
    end
  
    def fcntl_unlock
      do_fcntl_lock Fcntl::F_UNLCK
    end
    def do_fcntl_lock (type)
      lock_info = [type, 0, 0, 0, 0].pack("ssqqi")
      fcntl Fcntl::F_SETLKW, lock_info
    end
end

# The actual program starts here.
if ARGV.length != 2
then
  raise "I need exactly two files"
end

inbox = ARGV[0]
outbox = ARGV[1]

puts "Moving the contents of #{inbox} to #{outbox} ..."

File.open(inbox, "r+") do |infile|
  File.open(outbox, "a") do |outfile|
    outfile.fcntl_write_lock
    infile.fcntl_write_lock
    while line = infile.gets
      outfile.puts line
    end
    infile.truncate(0)
    infile.fcntl_unlock
    outfile.fcntl_unlock
  end
end

puts "... done"


Posted by Branimir Dolicki at 16:38

# - G - Add comment

» Branimir's Blog
» Archive

You may request notification for Branimir's Blog.

bdolicki@branimir.com