public class BayesianAnalysisFeeder
extends org.apache.mailet.base.GenericMailet
Feeds ham OR spam messages to train the BayesianAnalysis
mailet.
The new token frequencies will be stored in a JDBC database.
Sample configuration:
<processor name="root">
<mailet match="RecipientIs=not.spam@thisdomain.com" class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>ham</feedType>
<!--
Set this to the maximum message size (in bytes) that a message may have
to be analyzed (default is 100000).
-->
<maxSize>100000</maxSize>
</mailet>
<mailet match="RecipientIs=spam@thisdomain.com" class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>spam</feedType>
<!--
Set this to the maximum message size (in bytes) that a message may have
to be analyzed (default is 100000).
-->
<maxSize>100000</maxSize>
</mailet>
<processor>
The previous example will allow the user to send messages to the server and use the recipient email address as the indicator for whether the message is ham or spam.
Using the example above, send good messages (ham not spam) to the email address "not.spam@thisdomain.com" to pump good messages into the feeder, and send spam messages (spam not ham) to the email address "spam@thisdomain.com" to pump spam messages into the feeder.
The bayesian database tables will be updated during the training reflecting the new data
At the end the mail will be destroyed (ghosted).
The correct approach is to send the original ham/spam message as an attachment to another message sent to the feeder; all the headers of the enveloping message will be removed and only the original message's tokens will be analyzed.
After a training session, the frequency Corpus used by
BayesianAnalysis
must be rebuilt from the database, in order to
take advantage of the new token frequencies. Every 10 minutes a special
thread in the BayesianAnalysis
mailet will check if any change
was made to the database, and rebuild the corpus if necessary.
Only one message at a time is scanned (the database update activity is synchronized) in order to avoid too much database locking, as thousands of rows may be updated just for one message fed.
BayesianAnalysis
,
BayesianAnalyzer
,
JDBCBayesianAnalyzer
Constructor and Description |
---|
BayesianAnalysisFeeder() |
Modifier and Type | Method and Description |
---|---|
String |
getMailetInfo()
Return a string describing this mailet.
|
int |
getMaxSize()
Getter for property maxSize.
|
void |
init()
Mailet initialization routine.
|
void |
service(org.apache.mailet.Mail mail)
Scans the mail and updates the token frequencies in the database.
|
void |
setDataSource(DataSource datasource) |
void |
setFileSystem(FileSystem fs) |
void |
setMaxSize(int maxSize)
Setter for property maxSize.
|
public String getMailetInfo()
getMailetInfo
in interface org.apache.mailet.Mailet
getMailetInfo
in class org.apache.mailet.base.GenericMailet
public int getMaxSize()
public void setDataSource(DataSource datasource)
public void setMaxSize(int maxSize)
maxSize
- New value of property maxSize.public void setFileSystem(FileSystem fs)
public void init() throws javax.mail.MessagingException
init
in class org.apache.mailet.base.GenericMailet
javax.mail.MessagingException
- if a problem arisespublic void service(org.apache.mailet.Mail mail)
service
in interface org.apache.mailet.Mailet
service
in class org.apache.mailet.base.GenericMailet
mail
- The Mail message to be scanned.Copyright © 2002-2012 The Apache Software Foundation. All Rights Reserved.