public class BayesianAnalyzer extends Object
Determines probability that text contains Spam.
Based upon Paul Grahams' A Plan for Spam. Extended to Paul Grahams' Better Bayesian Filtering.
Sample method usage:
Use: void addHam(Reader) and void addSpam(Reader) methods to build up the Maps of ham & spam tokens/occurrences. Both addHam and addSpam assume they're reading one message at a time, if you feed more than one message per call, be sure to adjust the appropriate message counter: hamMessageCount or spamMessageCount. Then...
Use: void buildCorpus() to build the final token/probabilities Map. Use your own methods for persistent storage of either the individual ham/spam corpus & message counts, and/or the final corpus. Then you can...
Use: double computeSpamProbability(Reader) to determine the probability that a particular text contains spam. A returned result of 0.9 or above is an indicator that the text was spam.
If you use persistent storage, use: void setCorpus(Map) before calling computeSpamProbability.
Constructor and Description |
---|
BayesianAnalyzer()
Basic class constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addHam(Reader stream)
Adds a message to the ham list.
|
void |
addSpam(Reader stream)
Adds a message to the spam list.
|
void |
buildCorpus()
Builds the corpus from the existing ham & spam counts.
|
void |
clear()
Clears all analysis repositories and counters.
|
double |
computeSpamProbability(Reader stream)
Computes the probability that the stream contains SPAM.
|
Map<String,Double> |
getCorpus()
Public getter for corpus.
|
int |
getHamMessageCount()
Public getter for hamMessageCount.
|
Map<String,Integer> |
getHamTokenCounts()
Public getter for the hamTokenCounts Map.
|
int |
getSpamMessageCount()
Public getter for spamMessageCount.
|
Map<String,Integer> |
getSpamTokenCounts()
Public getter for the spamTokenCounts Map.
|
void |
setCorpus(Map<String,Double> corpus)
Public setter for corpus.
|
void |
setHamMessageCount(int hamMessageCount)
Public setter for hamMessageCount.
|
void |
setHamTokenCounts(Map<String,Integer> hamTokenCounts)
Public setter for the hamTokenCounts Map.
|
void |
setSpamMessageCount(int spamMessageCount)
Public setter for spamMessageCount.
|
void |
setSpamTokenCounts(Map<String,Integer> spamTokenCounts)
Public setter for the spamTokenCounts Map.
|
void |
tokenCountsClear()
Clears token counters.
|
public void setHamTokenCounts(Map<String,Integer> hamTokenCounts)
hamTokenCounts
- The new ham Token counts Map.public Map<String,Integer> getHamTokenCounts()
public void setSpamTokenCounts(Map<String,Integer> spamTokenCounts)
spamTokenCounts
- The new spam Token counts Map.public Map<String,Integer> getSpamTokenCounts()
public void setSpamMessageCount(int spamMessageCount)
spamMessageCount
- The new spam message count.public int getSpamMessageCount()
public void setHamMessageCount(int hamMessageCount)
hamMessageCount
- The new ham message count.public int getHamMessageCount()
public void clear()
public void tokenCountsClear()
public void setCorpus(Map<String,Double> corpus)
corpus
- The new corpus.public void buildCorpus()
public void addHam(Reader stream) throws IOException
stream
- A reader stream on the ham message to analyzeIOException
- If any error occurspublic void addSpam(Reader stream) throws IOException
stream
- A reader stream on the spam message to analyzeIOException
- If any error occurspublic double computeSpamProbability(Reader stream) throws IOException
stream
- The text to be analyzed for Spamminess.IOException
- If any error occursCopyright © 2002-2012 The Apache Software Foundation. All Rights Reserved.