org.apache.james.transport.mailets
Class BayesianAnalysisFeeder

java.lang.Object
  extended by org.apache.mailet.GenericMailet
      extended by org.apache.james.transport.mailets.BayesianAnalysisFeeder
All Implemented Interfaces:
Mailet, MailetConfig

public class BayesianAnalysisFeeder
extends GenericMailet

Feeds ham OR spam messages to train the BayesianAnalysis mailet.

The new token frequencies will be stored in a JDBC database.

Sample configuration:


 <processor name="root">
 
   <mailet match="RecipientIs=not.spam@thisdomain.com" class="BayesianAnalysisFeeder">
     <repositoryPath> db://maildb </repositoryPath>
     <feedType>ham</feedType>
     <!--
       Set this to the maximum message size (in bytes) that a message may have
       to be analyzed (default is 100000).
     -->
     <maxSize>100000</maxSize>
   </mailet>
 
   <mailet match="RecipientIs=spam@thisdomain.com" class="BayesianAnalysisFeeder">
     <repositoryPath> db://maildb </repositoryPath>
     <feedType>spam</feedType>
     <!--
       Set this to the maximum message size (in bytes) that a message may have
       to be analyzed (default is 100000).
     -->
     <maxSize>100000</maxSize>
   </mailet>
 
 <processor>
 

The previous example will allow the user to send messages to the server and use the recipient email address as the indicator for whether the message is ham or spam.

Using the example above, send good messages (ham not spam) to the email address "not.spam@thisdomain.com" to pump good messages into the feeder, and send spam messages (spam not ham) to the email address "spam@thisdomain.com" to pump spam messages into the feeder.

The bayesian database tables will be updated during the training reflecting the new data

At the end the mail will be destroyed (ghosted).

The correct approach is to send the original ham/spam message as an attachment to another message sent to the feeder; all the headers of the enveloping message will be removed and only the original message's tokens will be analyzed.

After a training session, the frequency Corpus used by BayesianAnalysis must be rebuilt from the database, in order to take advantage of the new token frequencies. Every 10 minutes a special thread in the BayesianAnalysis mailet will check if any change was made to the database, and rebuild the corpus if necessary.

Only one message at a time is scanned (the database update activity is synchronized) in order to avoid too much database locking, as thousands of rows may be updated just for one message fed.

Since:
2.3.0
Version:
CVS $Revision: $ $Date: $
See Also:
BayesianAnalysis, BayesianAnalyzer, JDBCBayesianAnalyzer

Constructor Summary
BayesianAnalysisFeeder()
           
 
Method Summary
 java.lang.String getMailetInfo()
          Return a string describing this mailet.
 int getMaxSize()
          Getter for property maxSize.
 void init()
          Mailet initialization routine.
 void service(Mail mail)
          Scans the mail and updates the token frequencies in the database.
 void setMaxSize(int maxSize)
          Setter for property maxSize.
 
Methods inherited from class org.apache.mailet.GenericMailet
destroy, getInitParameter, getInitParameter, getInitParameterNames, getMailetConfig, getMailetContext, getMailetName, init, log, log
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BayesianAnalysisFeeder

public BayesianAnalysisFeeder()
Method Detail

getMailetInfo

public java.lang.String getMailetInfo()
Return a string describing this mailet.

Specified by:
getMailetInfo in interface Mailet
Overrides:
getMailetInfo in class GenericMailet
Returns:
a string describing this mailet

getMaxSize

public int getMaxSize()
Getter for property maxSize.

Returns:
Value of property maxSize.

setMaxSize

public void setMaxSize(int maxSize)
Setter for property maxSize.

Parameters:
maxSize - New value of property maxSize.

init

public void init()
          throws javax.mail.MessagingException
Mailet initialization routine.

Overrides:
init in class GenericMailet
Throws:
javax.mail.MessagingException - if a problem arises

service

public void service(Mail mail)
Scans the mail and updates the token frequencies in the database. The method is synchronized in order to avoid too much database locking, as thousands of rows may be updated just for one message fed.

Specified by:
service in interface Mailet
Specified by:
service in class GenericMailet
Parameters:
mail - The Mail message to be scanned.


Copyright © 2002-2007 The Apache Software Foundation. All Rights Reserved.