org.apache.james.transport.mailets
Class BayesianAnalysis

java.lang.Object
  extended by org.apache.mailet.GenericMailet
      extended by org.apache.james.transport.mailets.BayesianAnalysis
All Implemented Interfaces:
Mailet, MailetConfig

public class BayesianAnalysis
extends GenericMailet

Spam detection mailet using bayesian analysis techniques.

Sets an email message header indicating the probability that an email message is SPAM.

Based upon the principals described in: A Plan For Spam by Paul Graham. Extended to Paul Grahams' Better Bayesian Filtering.

The analysis capabilities are based on token frequencies (the Corpus) learned through a training process (see BayesianAnalysisFeeder) and stored in a JDBC database. After a training session, the Corpus must be rebuilt from the database in order to acquire the new frequencies. Every 10 minutes a special thread in this mailet will check if any change was made to the database by the feeder, and rebuild the corpus if necessary.

A org.apache.james.spam.probability mail attribute will be created containing the computed spam probability as a Double. The headerName message header string will be created containing such probability in floating point representation.

Sample configuration:


 <mailet match="All" class="BayesianAnalysis">
   <repositoryPath>db://maildb</repositoryPath>
   <!--
     Set this to the header name to add with the spam probability
     (default is "X-MessageIsSpamProbability").
   -->
   <headerName>X-MessageIsSpamProbability</headerName>
   <!--
     Set this to true if you want to ignore messages coming from local senders
     (default is false).
     By local sender we mean a return-path with a local server part (server listed
     in <servernames> in config.xml).
   -->
   <ignoreLocalSender>true</ignoreLocalSender>
   <!--
     Set this to the maximum message size (in bytes) that a message may have
     to be considered spam (default is 100000).
   -->
   <maxSize>100000</maxSize>
 </mailet>
 

The probability of being spam is pre-pended to the subject if it is > 0.1 (10%).

The required tables are automatically created if not already there (see sqlResources.xml). The token field in both the ham and spam tables is case sensitive.

Since:
2.3.0
Version:
CVS $Revision: $ $Date: $
See Also:
BayesianAnalysisFeeder, BayesianAnalyzer, JDBCBayesianAnalyzer

Constructor Summary
BayesianAnalysis()
           
 
Method Summary
 long getLastCorpusLoadTime()
          Getter for property lastCorpusLoadTime.
 java.lang.String getMailetInfo()
          Return a string describing this mailet.
 int getMaxSize()
          Getter for property maxSize.
 void init()
          Mailet initialization routine.
 void service(Mail mail)
          Scans the mail and determines the spam probability.
 void setMaxSize(int maxSize)
          Setter for property maxSize.
 
Methods inherited from class org.apache.mailet.GenericMailet
destroy, getInitParameter, getInitParameter, getInitParameterNames, getMailetConfig, getMailetContext, getMailetName, init, log, log
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BayesianAnalysis

public BayesianAnalysis()
Method Detail

getMailetInfo

public java.lang.String getMailetInfo()
Return a string describing this mailet.

Specified by:
getMailetInfo in interface Mailet
Overrides:
getMailetInfo in class GenericMailet
Returns:
a string describing this mailet

getMaxSize

public int getMaxSize()
Getter for property maxSize.

Returns:
Value of property maxSize.

setMaxSize

public void setMaxSize(int maxSize)
Setter for property maxSize.

Parameters:
maxSize - New value of property maxSize.

getLastCorpusLoadTime

public long getLastCorpusLoadTime()
Getter for property lastCorpusLoadTime.

Returns:
Value of property lastCorpusLoadTime.

init

public void init()
          throws javax.mail.MessagingException
Mailet initialization routine.

Overrides:
init in class GenericMailet
Throws:
javax.mail.MessagingException - if a problem arises

service

public void service(Mail mail)
             throws javax.mail.MessagingException
Scans the mail and determines the spam probability.

Specified by:
service in interface Mailet
Specified by:
service in class GenericMailet
Parameters:
mail - The Mail message to be scanned.
Throws:
javax.mail.MessagingException - if a problem arises


Copyright © 2002-2007 The Apache Software Foundation. All Rights Reserved.