|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.mailet.GenericMailet org.apache.james.transport.mailets.BayesianAnalysisFeeder
public class BayesianAnalysisFeeder
Feeds ham OR spam messages to train the BayesianAnalysis
mailet.
The new token frequencies will be stored in a JDBC database.
Sample configuration:
<processor name="root">
<mailet match="RecipientIs=not.spam@thisdomain.com" class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>ham</feedType>
<!--
Set this to the maximum message size (in bytes) that a message may have
to be analyzed (default is 100000).
-->
<maxSize>100000</maxSize>
</mailet>
<mailet match="RecipientIs=spam@thisdomain.com" class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>spam</feedType>
<!--
Set this to the maximum message size (in bytes) that a message may have
to be analyzed (default is 100000).
-->
<maxSize>100000</maxSize>
</mailet>
<processor>
The previous example will allow the user to send messages to the server and use the recipient email address as the indicator for whether the message is ham or spam.
Using the example above, send good messages (ham not spam) to the email address "not.spam@thisdomain.com" to pump good messages into the feeder, and send spam messages (spam not ham) to the email address "spam@thisdomain.com" to pump spam messages into the feeder.
The bayesian database tables will be updated during the training reflecting the new data
At the end the mail will be destroyed (ghosted).
The correct approach is to send the original ham/spam message as an attachment to another message sent to the feeder; all the headers of the enveloping message will be removed and only the original message's tokens will be analyzed.
After a training session, the frequency Corpus used by BayesianAnalysis
must be rebuilt from the database, in order to take advantage of the new token frequencies.
Every 10 minutes a special thread in the BayesianAnalysis
mailet will check if any
change was made to the database, and rebuild the corpus if necessary.
Only one message at a time is scanned (the database update activity is synchronized) in order to avoid too much database locking, as thousands of rows may be updated just for one message fed.
BayesianAnalysis
,
BayesianAnalyzer
,
JDBCBayesianAnalyzer
Constructor Summary | |
---|---|
BayesianAnalysisFeeder()
|
Method Summary | |
---|---|
java.lang.String |
getMailetInfo()
Return a string describing this mailet. |
int |
getMaxSize()
Getter for property maxSize. |
void |
init()
Mailet initialization routine. |
void |
service(Mail mail)
Scans the mail and updates the token frequencies in the database. |
void |
setMaxSize(int maxSize)
Setter for property maxSize. |
Methods inherited from class org.apache.mailet.GenericMailet |
---|
destroy, getInitParameter, getInitParameter, getInitParameterNames, getMailetConfig, getMailetContext, getMailetName, init, log, log |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BayesianAnalysisFeeder()
Method Detail |
---|
public java.lang.String getMailetInfo()
getMailetInfo
in interface Mailet
getMailetInfo
in class GenericMailet
public int getMaxSize()
public void setMaxSize(int maxSize)
maxSize
- New value of property maxSize.public void init() throws javax.mail.MessagingException
init
in class GenericMailet
javax.mail.MessagingException
- if a problem arisespublic void service(Mail mail)
service
in interface Mailet
service
in class GenericMailet
mail
- The Mail message to be scanned.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |