|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.james.util.BayesianAnalyzer
public class BayesianAnalyzer
Determines probability that text contains Spam.
Based upon Paul Grahams' A Plan for Spam. Extended to Paul Grahams' Better Bayesian Filtering.
Sample method usage:
Use: void addHam(Reader) and void addSpam(Reader) methods to build up the Maps of ham & spam tokens/occurrences. Both addHam and addSpam assume they're reading one message at a time, if you feed more than one message per call, be sure to adjust the appropriate message counter: hamMessageCount or spamMessageCount. Then...
Use: void buildCorpus() to build the final token/probabilities Map. Use your own methods for persistent storage of either the individual ham/spam corpus & message counts, and/or the final corpus. Then you can...
Use: double computeSpamProbability(Reader) to determine the probability that a particular text contains spam. A returned result of 0.9 or above is an indicator that the text was spam.
If you use persistent storage, use: void setCorpus(Map) before calling computeSpamProbability.
Constructor Summary | |
---|---|
BayesianAnalyzer()
Basic class constructor. |
Method Summary | |
---|---|
void |
addHam(java.io.Reader stream)
Adds a message to the ham list. |
void |
addSpam(java.io.Reader stream)
Adds a message to the spam list. |
void |
buildCorpus()
Builds the corpus from the existing ham & spam counts. |
void |
clear()
Clears all analysis repositories and counters. |
double |
computeSpamProbability(java.io.Reader stream)
Computes the probability that the stream contains SPAM. |
java.util.Map |
getCorpus()
Public getter for corpus. |
int |
getHamMessageCount()
Public getter for hamMessageCount. |
java.util.Map |
getHamTokenCounts()
Public getter for the hamTokenCounts Map. |
int |
getSpamMessageCount()
Public getter for spamMessageCount. |
java.util.Map |
getSpamTokenCounts()
Public getter for the spamTokenCounts Map. |
void |
setCorpus(java.util.Map corpus)
Public setter for corpus. |
void |
setHamMessageCount(int hamMessageCount)
Public setter for hamMessageCount. |
void |
setHamTokenCounts(java.util.Map hamTokenCounts)
Public setter for the hamTokenCounts Map. |
void |
setSpamMessageCount(int spamMessageCount)
Public setter for spamMessageCount. |
void |
setSpamTokenCounts(java.util.Map spamTokenCounts)
Public setter for the spamTokenCounts Map. |
void |
tokenCountsClear()
Clears token counters. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BayesianAnalyzer()
Method Detail |
---|
public void setHamTokenCounts(java.util.Map hamTokenCounts)
hamTokenCounts
- The new ham Token counts Map.public java.util.Map getHamTokenCounts()
public void setSpamTokenCounts(java.util.Map spamTokenCounts)
spamTokenCounts
- The new spam Token counts Map.public java.util.Map getSpamTokenCounts()
public void setSpamMessageCount(int spamMessageCount)
spamMessageCount
- The new spam message count.public int getSpamMessageCount()
public void setHamMessageCount(int hamMessageCount)
hamMessageCount
- The new ham message count.public int getHamMessageCount()
public void clear()
public void tokenCountsClear()
public void setCorpus(java.util.Map corpus)
corpus
- The new corpus.public java.util.Map getCorpus()
public void buildCorpus()
public void addHam(java.io.Reader stream) throws java.io.IOException
stream
- A reader stream on the ham message to analyze
IOException
- If any error occurspublic void addSpam(java.io.Reader stream) throws java.io.IOException
stream
- A reader stream on the spam message to analyze
IOException
- If any error occurspublic double computeSpamProbability(java.io.Reader stream) throws java.io.IOException
stream
- The text to be analyzed for Spamminess.
IOException
- If any error occurs
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |