|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.mailet.GenericMailet org.apache.james.transport.mailets.BayesianAnalysis
public class BayesianAnalysis
Spam detection mailet using bayesian analysis techniques.
Sets an email message header indicating the probability that an email message is SPAM.
Based upon the principals described in: A Plan For Spam by Paul Graham. Extended to Paul Grahams' Better Bayesian Filtering.
The analysis capabilities are based on token frequencies (the Corpus)
learned through a training process (see BayesianAnalysisFeeder
)
and stored in a JDBC database.
After a training session, the Corpus must be rebuilt from the database in order to
acquire the new frequencies.
Every 10 minutes a special thread in this mailet will check if any
change was made to the database by the feeder, and rebuild the corpus if necessary.
A org.apache.james.spam.probability
mail attribute will be created
containing the computed spam probability as a Double
.
The headerName
message header string will be created containing such
probability in floating point representation.
Sample configuration:
<mailet match="All" class="BayesianAnalysis">
<repositoryPath>db://maildb</repositoryPath>
<!--
Set this to the header name to add with the spam probability
(default is "X-MessageIsSpamProbability").
-->
<headerName>X-MessageIsSpamProbability</headerName>
<!--
Set this to true if you want to ignore messages coming from local senders
(default is false).
By local sender we mean a return-path with a local server part (server listed
in <servernames> in config.xml).
-->
<ignoreLocalSender>true</ignoreLocalSender>
<!--
Set this to the maximum message size (in bytes) that a message may have
to be considered spam (default is 100000).
-->
<maxSize>100000</maxSize>
</mailet>
The probability of being spam is pre-pended to the subject if it is > 0.1 (10%).
The required tables are automatically created if not already there (see sqlResources.xml). The token field in both the ham and spam tables is case sensitive.
BayesianAnalysisFeeder
,
BayesianAnalyzer
,
JDBCBayesianAnalyzer
Constructor Summary | |
---|---|
BayesianAnalysis()
|
Method Summary | |
---|---|
long |
getLastCorpusLoadTime()
Getter for property lastCorpusLoadTime. |
java.lang.String |
getMailetInfo()
Return a string describing this mailet. |
int |
getMaxSize()
Getter for property maxSize. |
void |
init()
Mailet initialization routine. |
void |
service(Mail mail)
Scans the mail and determines the spam probability. |
void |
setMaxSize(int maxSize)
Setter for property maxSize. |
Methods inherited from class org.apache.mailet.GenericMailet |
---|
destroy, getInitParameter, getInitParameter, getInitParameterNames, getMailetConfig, getMailetContext, getMailetName, init, log, log |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BayesianAnalysis()
Method Detail |
---|
public java.lang.String getMailetInfo()
getMailetInfo
in interface Mailet
getMailetInfo
in class GenericMailet
public int getMaxSize()
public void setMaxSize(int maxSize)
maxSize
- New value of property maxSize.public long getLastCorpusLoadTime()
public void init() throws javax.mail.MessagingException
init
in class GenericMailet
javax.mail.MessagingException
- if a problem arisespublic void service(Mail mail) throws javax.mail.MessagingException
service
in interface Mailet
service
in class GenericMailet
mail
- The Mail message to be scanned.
javax.mail.MessagingException
- if a problem arises
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |