Sa-Learning (apprentissage) avec les "Dossiers Publics" sur un serveur Exchange

From Deimos.fr / Bloc Notes Informatique
Jump to: navigation, search

1 Introduction

This documentation is for implementing IMAP learning through exchange !
This mechanism works for both Exchange 5.5 SP4, or Exchange 2000+.

How to support ad-hoc Bayesian learning with Microsoft Exchange Server and Outlook ?

Many organizations use Microsoft Exchange, MS Outlook, and Outlook Express with IMAP for their corporate e-mail. Typically, SpamAssassin is running on a Linux box that tags the mail and forwards it to the Exchange server for delivery. One of the challenges in implementing SpamAssassin in this environment has been to provide a seamless mechanism for end users to train the bayesian filter. The reason this is difficult is that neither Outlook nor Outlook Express preserve the original message headers when mail is forwarded from one mailbox to another. This makes it tedious to send the necessary information to a spam or ham mailbox. Although mainly a training problem, most users are unwilling to take the additional time to manually copy the original headers into a new message, along with the original message body. It's simply too unwieldy to do so. This often leaves the task of Bayesian training to the mail admin, who receives forwarded spam message from the end users (usually without the pre-requisite headers) and is expected to add the offending email to a blacklist, or to create a new rule.

2 AD & Exchange Configuration

2.1 Active Directory

Create a domain user called "spamassassin" with minimal rights and create an exchange mailbox for it... it should never receive any mail. The account is there simply to give the account access to the public folders.

2.2 Exchange

The only time headers are properly preserved in Microsoft Outlook or Outlook Express, is during a drag and drop operation. This suggests a solution that takes advantage of Microsoft Exchange's public folder capabilities. A "Spam" public folder and "Ham" public folder can be created on the exchange server, allowing users to drag spam or ham into these folders where they will await retrieval by the SpamAssassin host.

So activate (if not already done) IMAP service in Exchange.

Using Outlook, and while logged in as an administrator, create the "Spam" and "Ham" public folders. Right click on each folder, go to the folder properties/permissions tab and make the spamassassin user a folder "Owner". This will give the spamassassin account the necessary privleges to delete processed messages. The default permissions should allow anyone to post to the folder, and delete only their items.

3 SpamAssassin Server Tools

To prepare the script to be allowed to run on the server, you need some packages to be installed :

apt-get install perl libmail-imapclient-perl

4 Perl Script

There you will find a perl script called imap-sa-learn.pl. This script will logon to any server supporting IMAP, retrieve any messages located in any arbitrarily named folder, process the contents of that folder as either ham or spam, delete the processed messages, and then run an sa-learn --rebuild.

#!/usr/bin/perl
 
# Imap Interface to SpamAssassin Learn (Power Version)     v0.02
# ----------------------------------------------------     -----
#
# Note - unless you want to do all sorts of complex stuff, you probably
#        want the normal version of this script, imap-sa-learn.pl
#        
# Connects to an imap server, and filters the messages from the INBOX
# and SpamTrap (unless otherwise told) through sa-learn. Allows you to
# save the emails into several different places, and other snazy things
#
#  usage:
#    power-imap-sa-learn.pl <-hamfolder HAM> <-spamfolder SPAM>
#
#  Other options:
#    -username nnnn	username for the server
#    -password nnnn	password for the server
#    -server nnnn	server to connect to
#
#    -skips nnn		skips over the first nnn messages in the folder(s)
#    -logmail		save the email into an mbox file using formail
#    -logmaildir nnn	dir to write into, default is ~/mail
#
#    -deletespam	after learning from a spam message, delete it
#    -delete-spam	(as above)
#    -dangerous-delete-ham	after learning from a ham (real email),
#    				delete it. Most people don't want this...
#    -dangerous-delete-all	after learning from any message, delete
#    				it, spam or ham
#
#
# Uses Mail::IMAPClient and SpamAssassin (sa-learn)
#
# Needs a version of SpamAssassin with the Bayesian filtering support,
# i.e. 2.50 or later
#
#      Nick Burch <nick@tirian.magd.ox.ac.uk>
#           25/06/2003
#      Modified by Deimos <deimos@deimos.fr>
#           07/11/2006
 
use Mail::IMAPClient;
 
# Define our default server and credentials here
# These can be overridden on the command line
#
#  ** fix me **    your details go below
my $username = 'john.smith';
my $password = 'bitter';
my $server = 'my.imap.provider.invalid';
 
# Define where to find messages
my $defspamfolder = 'SpamTrap';
my $defhamfolder = 'INBOX';
my $deletespam = 0;
my $deleteham = 0;
my $default = 1;
 
my $log = 0;
my $logdir = $ENV{'HOME'}."/mail";
 
my $skips = 0;
 
my @spams;
my @hams;
 
while(my $arg = shift) {
   if($arg eq "-spamfolder") {
     my $spam = shift;
     push @spams,$spam;
     print "Using spam folder $spam\n";
     $default = 0;
   }
   if($arg eq "-hamfolder") {
     my $ham = shift;
     push @hams,$ham;
     print "Using normal (ham) folder $ham\n";
     $default = 0;
   }
   if($arg eq "-username") {
     $username eq shift;
   }
   if($arg eq "-password") {
     $password eq shift;
   }
   if($arg eq "-server") {
     $server eq shift;
   }
   if($arg eq "-deletespam" || $arg eq "-deletespams" || $arg eq "-delete-spam" || $arg eq "-delete-spams") {
     $deletespam = 1;
   }
   if($arg eq "-dangerous-delete-ham" || $arg eq "-dangerous-delete-hams") {
     $deleteham = 1;
   }
   if($arg eq "-dangerous-delete-all") {
     $deletespam = 1;
     $deleteham = 1;
   }
   if($arg eq "-logmaildir" || $arg eq "-logmail-dir") {
     $logdir = shift;
     $log = 1;
   }
   if($arg eq "-logmail") {
     $log = 1;
   }
   if($arg eq "-skips" || $arg eq "-skip") {
     $skips = shift;
   }
   if($arg eq "-?" || $arg eq "-h") {
     print "Usage:\n";
     print "  imap-sa-learn.pl [-spamfolder f]* [-hamfolder f]*\n\n";
     print "with no argumnets, uses default folders\n";
     print "(a few other options exist, see the header of the program)\n";
     exit;
   }
}
 
if($default) {
   push @hams,$defhamfolder;
   push @spams,$defspamfolder;
}
 
my %folders;
$folders{'spam'} = \@spams;
$folders{'ham'} = \@hams;
 
 
# Normal (1), Debugging (2), or silent(0)?
my $debug = 1;
 
# Connect to the IMAP server in peek (i.e. don't set read flag) mode
my $imap = Mail::IMAPClient->new(Server   => $server,
				 User     => $username,
				 Password => $password,
				 Peek     => 1);
 
foreach my $type(keys %folders) {
   foreach my $folder (@{$folders{$type}}) {
      print "\nLooking in $type folder $folder\n";
 
      # Pick the folder
      $imap->select($folder);
 
      # Enable peek mode
      $imap->Peek(1);
 
      # Fetch messages
      my @mails = ($imap->seen(),$imap->unseen);
 
      my $count = 0;
 
      foreach my $id (@mails) {
         $count++;
         if($count < $skips) { next; }
 
         print " Learning on $type message $id\n";
         my $mail = $imap->message_string($id);
         open SA, "| sa-learn --no-sync --$type --single";
         print SA $mail;
         close SA;
 
         if($log) {
            print "Logging $type message $id\n";
            open LOG, "| formail -ds >> ".$logdir."/".$type;
            print LOG $mail;
            close LOG;
         }
 
         if($type eq "spam" && $deletespam) {
            # If you want to move the message rather than deleting it,
            # uncomment the line below, change the folder, but _don't_
            # remove the delete line!
            #$imap->append('TrashBin', $mail );
 
            print "Deleting Spam Message $id\n";
            $imap->delete_message($id);
         }
         if($type eq "ham" && $deleteham) {
            print "Deleting Ham (normal email) Message $id\n";
            $imap->delete_message($id);
         }
      }
      if($deleteham || $deletespam) {
         # Only expunge now, rather than on every message
         $imap->expunge();
      }
   }
}
 
print "Now rebuilding the Baysean filters\n";
`sa-learn --sync`;
 
$imap->close;
exit;

The script is simple to understand, and you need only predicate your public folder name with the "Public Folders" directive. For instance, if you create a public folder called "Spam", you would set the script variable containing the Spam folder's path to:

my $defspamfolder = 'Public Folders/Spam';

And for Ham folder :

my $defhamfolder = 'Public Folders/Ham';

In the script, set the login and password to the spamassain user's account ID and password, and test. By using a non-admin account for the spamassassin, you avoid the risk of having a plain-text administrator name and password sitting inside a perl script.